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PREFACE 


This  report  documents  the  work  performed  at  the  University  of  Oklahoma  under  SCEEE 
Subcontract  HER/90-001 1  for  the  Armstrong  Laboratory,  Performance  Assessment  and  Interface 
Technology  Branch  (AL/CFHP)  under  contract  F33615-88-D0532.  It  also  presents  the  results  of  a 
parallel  study  conducted  by  AL/CFHP.  The  effort  was  sponsored  by  the  Tri-Service  Office  of 
Military  Performance  Assessment  Technology  (OMPAT). 

As  outlined  in  the  Statement  of  Work  and  the  approved  Project  Design,  an  experimental  study 
was  conducted  to  provide  normative  data  and  a  better  understanding  of  a  subset  of  tasks  from  the 
Unified  Tri-Service  Cognitive  Performance  Assessment  Battery  (UTC-PAB).  In  addition  to 
providing  for  the  collection  and  summary  of  normative  data  for  tasks  from  the  AGARD 
Standardized  Tests  for  Research  with  Environmental  Stressors  (STRES),  the  Criterion  Task  Set 
(CTS),  and  a  subset  of  the  Walter  Reed  Army  Institute  of  Research  Performance  Assessment 
Battery  (WRAIR  PAB),  the  study  examined  issues  related  to  task  reliability,  comparability  of  tasks 
across  batteries,  group  vs.  individual  test  administration,  order  of  task  presentation  and  battery 
sequence,  test-retest  time  intervals,  imposition  of  response  deadlines,  extended  trial  lengths,  and 
the  usefulness  of  psychometric  state  measures. 

The  list  of  people  deserving  attribution  for  a  project  as  extensive  as  this  is  a  long  one.  The 
authors  gratefully  acknowledge  the  contributions  of  graduate  research  assistants  Randa  L.  Shehab, 
Scott  H.  Mills,  Patrick  L.  Foster,  and  Ioannis  Vasmatzidis  and  the  work  of  the  undergraduate 
support  team  (Mindy  Mitchell,  Rebecca  Kempner,  Tricia  Baird,  and  Tammy  Kasbaum)  in 
collection,  conversion,  summarization,  and  analysis  of  the  vast  amounts  of  data.  A  special  thanks 
goes  to  Mark  S.  Crabtree  of  Logicon  Technical  Services  Inc.  for  his  unselfish  support  of  our 
efforts  in  addition  to  his  collaborative  input  to  the  project  design  and  his  work  as  project 
coordinator  and  experimenter  for  the  parallel  effort  at  Armstrong  Laboratory.  Thanks  also  go  to 
Gary  B.  Reid  for  his  technical,  financial,  and  personal  support  of  our  work  over  several  years  and 
for  his  insightful  contributions  during  the  design  phase  of  this  project.  The  authors  thank  Dr. 
Dennis  L.  Reeves  for  his  role  in  the  development  of  the  UTC-PAB  AGARD  STRES  battery  and 
also  for  his  contributions  to  the  project  design,  and  acknowledge  the  skilled  and  timely 
programming  contributions  of  Kathy  M.  Winter,  Sam  J.  LaCour,  and  Kathy  Raynsford.  The 
Armstrong  Laboratory  effort  was  greatly  facilitated  by  Dr.  Herbert  Colle,  Wright  State  University, 
who  generously  provided  laboratory  space  and  equipment,  and  expedited  WSU  approval  of  the 
research.  Last  but  not  least,  the  authors  express  their  great  appreciation  to  Dr.  Frederick  W.  Hegge 
for  his  leadership  role  in  the  development  of  OMPAT,  his  contributions  to  the  design  of  this 
project,  and  his  funding  support 
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SUMMARY 

This  report  summarizes  the  development  and  analysis  of  a  comprehensive  normative  database 
for  a  large  subset  of  tasks  from  the  Unified  Tri-Service  Cognitive  Performance  Assessment  Battery 
(UTC-PAB).  The  tasks  were  members  of  the  AGARD  Standardized  Tests  for  Research  with 
Environmental  Stressors  (STRES),  the  Criterion  Task  Set  (CTS),  and  a  subset  of  the  Walter  Reed 
Army  Institute  of  Research  Performance  Assessment  Battery  (WRAIR  PAB).  Data  were  collected 
at  the  University  of  Oklahoma  and  in  a  parallel  study  conducted  by  Armstrong  Laboratory.  All 
data  were  analyzed  at  the  University  of  Oklahoma  to  address  issues  related  to  task  reliability, 
comparability  of  tasks  across  batteries,  group  vs.  individual  test  administration,  order  of  task 
presentation  and  battery  sequence,  test-retest  time  intervals,  imposition  of  response  deadlines, 
extended  trial  lengths,  and  the  usefulness  of  psychometric  state  measures. 

With  few  exceptions,  the  data  showed  remarkable  consistency  across  task  batteries  and  within 
task  types.  Task  reliability  varied  primarily  as  a  function  of  the  dependent  measure.  CTS  data 
showed  good  correspondence  to  a  previous  large-scale  CTS  database.  Task  presentation  order  and 
battery  sequence  did  not  influence  task  performance.  Response  deadlines  provided  a  faster  mean 
response  time  but  at  the  expense  of  more  missed  responses.  Extended  trial  lengths  had  a  more 
profound  effect  on  continuous  motor  tasks  such  as  Unstable  Tracking.  Changes  in  the 
psychometric  state  measures  of  sleepiness  and  mood  were  logical  reflections  of  time  on  task. 
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DEVELOPMENT  OF  THE  UTC-PAB  NORMATIVE  DATABASE 


1.0  INTRODUCTION 

The  Tri-Service  Working  Group  of  the  Office  of  Military  Performance  Assessment 
Technology  (OMPAT)  undertook  a  program  to  develop  task  batteries  and  a  standardized  test 
methodology  for  human  performance  assessment.  One  major  part  of  this  assessment  effort  has 
been  the  development  of  the  Unified  Tri-Service  Cognitive  Performance  Assessment  Battery 
(UTC-PAB),  a  specialized  human  performance  task  battery  for  laboratory  and  field  research.  The 
UTC-PAB  consists  of  numerous  human  performance  tasks  organized  in  various  sub-batteries. 
One  of  the  most  recently  developed  and  more  sophisticated  of  these  batteries  is  the  UTC- 
PAB/AGARD  STRES  Battery  (Reeves,  Winter,  LaCour,  Winter,  Vogel,  and  Grissett,  1990). 
Other  UTC-PAB  supported  batteries  include  the  U.S.  Air  Force  Criterion  Task  Set  (CTS; 
Shingledecker,  1984;  Shingledecker,  Acton,  and  Crabtree,  1983)  and  the  Walter  Reed  Army 
Institute  of  Research  Performance  Assessment  Battery  (WRAIR  PAB;  Thome,  Genser,  Sing,  and 
Hegge,  1985). 

The  present  project  was  initiated  in  response  to  the  need  for  a  better  understanding  of  several 
tasks  in  the  UTC-PAB,  namely  the  subset  comprising  the  STRES  battery.  In  its  earliest  stages  the 
project  was  conceived  to  address  a  fairly  specific  need,  namely  the  development  of  a  normative 
database  for  the  STRES  battery.  This  was  expanded  to  focus  on  two  other  batteries  that  are  closely 
associated  with  OMPAT  efforts  in  battery  standardization  and  development,  the  CTS  and  the 
WRAIR  PAB.  A  fundamental  objective  of  the  effort  was  to  develop  an  integrated  database  that 
could  be  used  not  only  for  normative  data  comparisons,  but  also  to  answer  basic  questions 
regarding  how  subjects  respond  cn  tasks  implemented  in  a  specific  battery,  and  how  task  behavior 
varies  across  batteries  with  similar  tasks. 

As  the  project  evolved,  it  was  expanded  to  accommodate  several  basic  research  questions 
developed  by  the  principal  investigators.  Based  on  extensive  past  experience  with  the  Criterion 
Task  Set  (Schlegel  and  Gilliland,  1990)  and  more  recent  experience  with  the  UTC-PAB/AGARD 
STRES  battery  (Baird,  Kasbaum,  and  Schlegel,  1990),  other  basic  problems  were  identified  as 
logical  candidates  for  investigation.  First,  the  degree  to  which  response  deadlines  influence  the 
nature,  speed,  and  distribution  of  subject  responses  was  identified  as  an  important  problem. 
Second,  trial  length  was  believed  to  be  an  important  variable  in  determining  the  nature  of  the 
performance  obtained,  yet  very  little  is  known  about  this  variable  in  task  battery  construction.  It 
was  also  believed  that  trial  length  analysis  might  provide  a  possible  way  to  explore  the  dynamics  of 


task  performance  over  time.  Third,  numerous  additional  questions  arose  regarding  the  reliability  of 
performance  on  the  selected  tasks.  And,  finally,  there  was  concern  for  the  effects  of  the  sequence 
in  which  tasks  (or  batteries)  are  presented  to  subjects  and  whether  any  carryover  performance 
effects  (either  learning  or  fatigue)  existed.  For  these  reasons,  response  deadlines,  trial  length,  task 
order  and  battery  sequence,  along  with  reliability  were  among  the  major  foci  of  the  current  project. 

This  report  summarizes  the  experimental  design  and  methods  used  to  develop  the  normative 
database  and  to  address  the  outlined  research  questions.  It  also  provides  statistical  summaries  and 
analyses  of  the  performance  data.  Section  2.0  (Background)  provides  a  description  of  the 
development  of  the  UTC-PAB.  Specific  information  regarding  the  creation  of  OMPAT  and  the 
evolution  of  the  STRES  battery,  the  CTS,  and  the  WRAIR  PAB  are  presented.  Section  3.0 
(Establishing  a  UTC-PAB  Normative  Database)  presents  the  specific  research  goals  of  the  project. 
This  section  is  followed  by  Section  4.0  (Project  Design  and  Method)  which  provides  an  extensive 
overview  of  the  methodology  and  procedures  used  in  the  project.  Section  5.0  (Project  Results) 
presents  the  normative  data,  as  well  as  analyses  of  the  additional  research  problems  addressed  in 
this  project  Finally,  Section  6.0  (Summary)  provides  a  brief  list  of  the  major  research  findings  of 
the  project  Extensive  appendices  that  provide  additional  detailed  information  about  the  project  data 
complete  the  report 


2.0  BACKGROUND 


2.1  Need  for  Performance  Assessment  Batteries 

As  advances  in  technology  increase  the  complexity  of  various  operational  environments,  it  has 
become  increasingly  important  to  develop  methods  for  assessing  and  predicting  the  nature  and 
amount  of  workload  associated  with  specific  operator  tasks.  Greater  demands  are  now  placed  on 
designers  to  include  a  priori  evaluations  of  not  only  the  amount,  but  also  the  type  of  work  required 
in  newly  designed  work  environments.  In  addition,  a  renewed  interest  in  the  effects  of  various 
environmental  stressors  has  served  to  promote  the  need  for  highly  reliable  and  valid  measures  of 
human  performance. 

Developments  in  at  least  two  areas  have  enabled  this  need  to  be  addressed.  First,  many 
evolving  theories  of  cognition  and  human  performance  have  played  important  roles  in  defining 
both  the  theoretical  and  practical  limits  of  mental  work  capacity  (e.g.,  Broadbent,  1958;  Kerr, 
1973;  Navon  and  Gopher,  1979;  Norman,  1968;  Sanders,  1979;  Treisman,  1969;  Wickens, 
1980).  Second,  technological  advances,  especially  microprocessor  developments,  have  allowed 
more  intricate  levels  of  task  modeling,  more  control  over  task  presentation,  and  more  accurate  and 
enlarged  data  collection  ability.  The  linking  of  cognitive  theory  developments  with  recent 
microprocessor  improvements  has  led  to  the  construction  of  a  number  of  sophisticated  human 
performance  task  batteries  that  provide  considerable  promise  for  advancing  both  theory  and 
application  in  a  variety  of  fields.  However,  some  of  these  batteries  appear  to  be  more 
advantageous  than  others  due  to  more  sophisticated  levels  of  implementation,  ease  of  use,  and 
greater  linkage  to  both  applied  and  theoretical  research. 

The  Office  of  Military  Performance  Assessment  Technology  (OMPAT)  has  engaged  in  a 
program  to  develop  a  standardized  test  methodology  and  task  battery  for  human  performance 
assessment  .  Among  OMPAT's  numerous  accomplishments  thus  far  are  two  advances  that  bear 
directly  on  this  project.  First,  OMPAT  has  established  a  superordinate  pool  of  candidate  human 
performance  tasks  identified  as  the  Unified  Tri-Service  Cognitive  Performance  Assessment  Battery 
(UTC-PAB).  This  UTC-PAB  pool  of  tasks  is  organized  largely  in  subsets  or  batteries.  Second, 
OMPAT  has  established  the  Performance  Information  Management  System  (PIMS)  to  serve  as  a 
clearinghouse  for  UTC-PAB  databases.  Thus,  as  research  is  conducted  on  various  UTC-PAB 
tasks,  data  from  these  studies  can  be  consolidated  and  disseminated  through  a  central,  organized 
framework. 

This  report  documents  a  major  research  effort  to  explore  three  UTC-PAB  subsets.  Aside  from 
contributing  to  the  normative  database  on  the  included  tasks,  the  project  explored  numerous 
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research  questions  regarding  the  interrelationships  of  the  batteries  and  tasks.  The  project  also 
examined  how  selected  variables  believed  to  influence  human  task  performance  actually  affect 
performance  on  these  tasks. 

2.2  Unified  Tri-Service  Cognitive  Performance  Assessment  Battery  (UTC-PAB) 

As  outlined  by  Englund,  Reeves,  Shingledecker,  Thome,  Wilson  and  Hegge  (1985, 1987;  see 
also  Perez,  Masline,  Ramsey,  and  Urban,  1987),  the  concept  for  the  Unified  Tri-Service  Cognitive 
Performance  Assessment  Battery  (UTC-PAB)  evolved  from  the  Military  Performance  Working 
Group  in  1983.  This  group  proposed  the  UTC-PAB  as  a  primary  measurement  instrument  for  the 
evaluation  of  cognitive  performance  within  the  framework  of  a  larger  multi-level  biomedical  drug 
evaluation  program.  These  efforts  gave  rise  to  more  detailed  task  specifications  for  the  UTC-PAB 
through  the  actions  of  the  Joint  Working  Group  on  Drug  Dependent  Degradation  of  Military 
Performance  (JWGD^  MILPERF  -  Task  Area  Group  workshop  in  November  1984,  at  the  Naval 
Medical  Research  Institute,  Bethesda,  Maryland).  This  joint  working  group  was  the  predecessor 
of  the  Office  of  Military  Performance  Assessment  Technology  (OMPAT). 

This  period  saw  the  development  of  several  human  performance  task  batteries  such  as  the 
U.S.  Air  Force  Criterion  Task  Set  (CTS;  Shingledecker,  1984),  the  Walter  Reed  Army  Institute  of 
Research  Performance  Assessment  Battery  (WRAIR  PAB;  Thorne,  Genser,  Sing,  and  Hegge, 
1985),  and  others  (e.g,,  Bittner,  Carter,  Kennedy,  Harbeson,  and  Krause,  1984).  A  major 
contribution  by  OMPAT  was  to  bring  together  the  most  theoretically  representative  and  practically 
relevant  tasks  from  these  numerous  sources  into  one  standardized  format  (Hegge,  Reeves,  Poole, 
and  Thome,  1985;  Englund  et  al.,  1985,  1987).  At  that  point  in  its  development,  the  UTC-PAB 
represented  a  pool  of  approximately  25  human  performance  tasks  that  were  believed  to  assess 
various  stages  of  cognitive  processing,  as  well  as  both  selective  and  divided  attention  functions. 
The  UTC-PAB  was  also  envisioned  as  a  dynamic  task  battery  system  that  would  evolve  over  time 
(presumably,  as  tasks  were  added,  modified,  or  removed),  and  could  be  used  flexibly  by  adopting 
a  "core"  subset  of  tasks,  or  by  constructing  unique  subsets  of  tasks  for  project-specific  purposes. 

An  important  related  development  in  the  evolution  of  the  UTC-PAB  was  a  meeting  of  the 
NATO  Advisory  Group  for  Aerospace  Research  and  Development  (AGARD),  Aerospace  Medical 
Panel  Working  Group  12  on  Human  Performance  Assessment  Methods  (AGARD,  1989).  The 
focal  point  of  this  meeting  was  to  address  the  need  for  a  task  battery  to  investigate  the  influence  of 
environmental  stressors  on  human  performance.  Working  Group  12  was  formed  to  review  the 
relevant  human  task  performance  literature  and  to  select  a  subset  of  tasks  that  might  be  optimally 
combined  to  provide  the  AGARD  Standardized  Tests  for  Research  with  Environmental  Stressors 
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(AG ARD  STRES)  Battery.  As  noted  in  their  report,  the  panel  concluded  that  their  effort  to  design 
the  AGARD  STRES  Battery  could  be  considered  an  extension  of  the  OMPAT  UTC-PAB  approach 
to  battery  construction.  Although  the  AGARD  group  had  a  much  more  specific  purpose  in  mind, 
they  adopted  OMPATs  general  approach  of  identifying  task  subsets,  and  they  also  selected  tasks 
that  were,  for  the  most  part,  already  included  in  the  OMPAT  UTC-PAB.  For  example,  OMPAT 
had  already  incorporated  many  of  the  tasks  from  the  USAF  Criterion  Task  Set.  AGARD  provided 
specific  recommendations  and  parameters  for  task  presentations,  but  not  specific  computer 
programs  requiring  specific  computer  equipment  configurations.  Thus,  from  a  broader 
perspective,  the  AGARD  STRES  Battery  can  be  viewed  as  a  subset  of  the  UTC-PAB  that  has  been 
more  highly  defined  on  one  hand,  while  being  presented  in  a  more  "machine-independent"  manner 
on  the  other. 

In  response  to  the  AGARD  recommendations,  OMPAT  supported  an  effort  to  construct  an 
AGARD  STRES  implementation  within  the  framework  of  the  UTC-PAB.  This  battery  has  been 
officially  designated  the  UTC-PAB/AGARD  STRES  Battery  (Reeves,  Winter,  LaCour,  Winter, 
Vogel,  and  Grissett,  1990).  It  will  be  referred  to  in  this  report  as  the  "STRES  Battery."  Thus,  the 
STRES  Battery  is  the  latest  and  perhaps  the  most  sophisticated  battery  to  emerge  from  the  OMPAT 
UTC-PAB  program.  However,  at  the  present  time,  the  STRES  Battery,  the  CTS  and  the  WRAIR 
PAB  together  probably  play  the  most  prominent  roles  as  applied  task  batteries  within  the  UTC- 
PAB  framework.  It  is  also  important  to  note  that  each  of  these  primary  batteries  was  designed  to 
address  a  specific  research  application  as  outlined  in  the  brief  overviews  that  follow. 

2.2.1  Standardized  Tests  for  Research  with  Environmental  Stressors  (STRES) 

According  to  the  original  AGARD  report,  the  AGARD  STRES  Battery  was  designed  to 
evaluate  the  effect*  of  environmental  stressors  on  selected  aspects  of  cognitive  performance.  For 
this  reason,  the  specific  tasks  recommended  by  the  AGARD  Working  Group  were  chosen  for  their 
conformity  to  basic  Human  Performance  Theory  (AGARD,  1989).  In  other  words,  these  tasks  are 
typically  short  duration,  repetitive,  highly-structured  information  processing  tasks  with  well- 
defined  stimuli  linked  readily  to  simply  structured  responses. 

As  noted  above,  the  UTC-PAB/AGARD  STRES  Battery  (STRES  Battery)  is  the  latest  battery 
to  emerge  from  the  broader  OMPAT  UTC-PAB  effort  (Reeves  et  al.,  1990).  Working  assiduously 
within  the  OMPAT  UTC-PAB  guidelines,  Reeves  and  his  colleagues  drew  from  the  UTC-PAB 
task  pool  selected  versions  of  tasks  recommended  by  the  AGARD  report.  These  tasks  were  then 
combined  in  one  subset,  and  programmed  for  a  standardized  computer  hardware  configuration 
compatible  with  previous  UTC-PAB  task  implementations.  Currently,  the  STRES  Battery 
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includes  seven  tasks  that  conform  to  the  original  AGARD  STRES  Battery  recommendations 
(AGARD,  1989),  and  now  represents  one  of  the  most  advanced  and  consensuaily  supported 
batteries  within  OMPAT's  UTC-PAB  program.  The  tasks  include:  Reaction  Time,  Mathematical 
Processing,  Memory  Search,  Spatial  Processing,  Unstable  Tracking,  Grammatical  Reasoning,  and 
an  Unstable  Tracking/Memory  Search  dual  task.  More  detailed  descriptions  of  the  STRES  Battery 
tasks  are  provided  in  Section  4.3. 

2.2.2  Criterion  Task  Set  (CTS) 

The  CTS  is  a  human  performance  battery  developed  as  a  tool  to  facilitate  the  evaluation  of 
mental  workload  metrics  (Shingledecker,  1984;  Shingledecker,  Acton,  and  Crabtree,  1983).  In 
this  regard,  the  CTS  was  originally  designed  to  provide  a  set  of  standardized  loading  tasks  to 
evaluate  the  relative  sensitivity,  reliability,  and  intrusiveness  of  a  variety  of  proposed  behavioral, 
subjective,  and  physiological  indices  of  workload.  The  CTS  was  thus  designed  as  a  set  of 
"benchmark"  tests  with  which  project-specific  workload  measures  could  be  calibrated  or 
compared.  Of  course,  in  addition  to  this  benchmark  function,  the  CTS  has  also  been  used  as  a 
standardized  task  battery  for  human  performance  assessment. 

Perhaps  the  most  important  feature  of  the  CTS  is  the  fact  that  it  was  one  of  the  first  human 
performance  batteries  to  be  based  on  current  information  processing  theories  (i.e..  Multiple 
Resource  Theory  -  Wickens,  1992;  and  Processing  Stage  Theory  -  Sternberg,  1969).  According 
to  these  theories,  human  mental  performance  is  dependent  on  a  number  of  stages,  information 
processing  resources,  and  specific  functions.  The  CTS  model  hypothesizes  three  primary  stages 
of  processing:  perceptual  input,  central  processing,  and  motor  output.  There  are  specific  mental 
processing  resources  associated  with  the  input  mode  (either  visual  or  auditory),  the  type  of  coding 
during  central  processing  (either  spatial/imaginal  or  abstract/symbolic),  and  the  mode  of  response 
output  (either  manual  or  vocal).  Also,  the  central  processing  stage  is  further  divided  to  emphasize 
memory/recall  functions  and  elementary  mental  activities  such  as  information  manipulation, 
reasoning,  and  planning/scheduling. 

This  model  was  used  to  guide  the  selection  of  CTS  tasks  which  would  be  representative  of  the 
range  of  human  operator  performance.  This  was  accomplished  by  operationally  defining  each 
element  in  the  model  in  terms  of  the  task  characteristics  associated  with  the  resources  required  by 
the  element.  For  example,  resources  associated  with  the  visual  perceptual/input  element  were 
defined  in  terms  of  the  task  characteristics  of  stimulus  discriminability  and  numerosity  of  display 
sources.  These  characteristics  would  be  represented  by  tasks  requiring  simple  detection  as  well  as 
monitoring  and  scanning.  Additionally,  it  was  recognized  that  any  task  is  likely  to  make  demands 
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at  all  processing  stages.  Thus,  when  actually  selecting  a  candidate  task  for  a  specific  element  of  the 
model,  such  as  visual  perceptual/input,  the  loading  demands  on  central  processing  and 
motor/output  elements  were  minimized. 

An  additional  feature  of  the  CTS  battery  of  tasks  is  that  (except  for  Interval  Production)  three 
versions  of  each  task  are  included  to  provide  graded  loading  levels  (i.e.,  an  easy,  moderately 
difficult  and  difficult  version).  In  this  manner,  the  CTS  actually  provides  a  task  taxonomy  for 
evaluating  workload  metric  sensitivity  along  the  dimensions  of  mental  workload  type  (i.e., 
resource/stage)  and  workload  level  (i.e.,  difficulty).  This  feature  of  the  CTS  also  allows 
investigation  of  the  sensitivity  of  stressor  effects  on  performance. 

A  wide  range  of  tasks  from  the  literature  on  cognitive  and  psychomotor  performance  was 
screened  according  to  the  resource  theory  outlined  above.  The  screening  process  resulted  in  the 
selection  of  nine  tasks  for  CTS  Version  1.0.  Initial  parametric  studies  were  completed  to  determine 
estimates  of  training  time  needed  for  each  task,  to  determine  task  pacing  rates,  and  to  establish 
standard  task  loading  levels.  The  standard  loading  levels  were  determined  through  comparison  of 
post-asymptotic  performance  measures  and  were  corroborated  by  subjective  ratings  of  task 
difficulty  and  complexity  (Shingledecker,  1984). 

Detailed  task  descriptions  and  the  results  of  the  initial  parametric  studies  are  provided  by 
Shingledecker  (1984).  Training  performance  and  training  requirements  for  the  tasks  when 
presented  as  a  complete  battery  are  given  in  Schlegel  (1986).  Results  from  a  large  scale  normative 
data  collection  study  are  presented  in  Schlegel  and  Gilliland  (1990).  Based  primarily  on  the  results 
of  the  latter  effort,  several  of  the  tasks  were  modified  in  the  development  of  CTS  Version  2.0. 

CTS  Version  2.0  consists  of  nine  tasks  including:  Display  Monitoring,  Continuous 
Recognition,  Memory  Search,  Linguistic  Processing,  Mathematical  Processing,  Spatial 
Processing,  Grammatical  Reasoning,  Unstable  Tracking,  and  Interval  Production.  In  addition  to 
the  comprehensive  normative  database  study  (Schlegel  and  Gilliland,  1990),  the  CTS  tasks  as  a 
whole  or  in  part  have  been  successfully  used  as  dependent  measures  in  numerous  human 
performance  studies  in  settings  as  wide-ranging  as  cockpit  workload  analyses,  evaluations  of 
vibrotactile  helmet-mounted  displays  (Lambert,  1990;  Schlegel  and  Gilliland,  1990),  and  the 
establishment  of  index  levels  of  excessive  workload  (Schlegel  and  Gilliland,  1986;  Schlegel, 
Schlegel  and  Gilliland,  1988).  More  detailed  descriptions  of  the  CTS  tasks  examined  in  this  effort 
are  provided  in  Section  4.3. 
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2.2.3  Waller  Reed  Performance  Assessment  Battery  (WRAIR  PAB) 

The  Walter  Reed  Army  Institute  of  Research  Performance  Assessment  Battery  (WRAIR  PAB) 
has  been  in  development  and  use  for  a  number  of  years  (Thome  et  al.,  1983).  Like  the  STRES 
Battery,  the  WRAIR  PAB  was  designed  primarily  as  a  means  for  assessing  the  influence  of 
treatment  effects  (stressors,  drugs,  etc.),  especially  within  the  context  of  investigations  utilizing 
repeated  measures.  To  accommodate  this  type  of  research,  the  WRAIR  PAB  emphasizes  relatively 
brief  tasks  and  provides  nearly  unlimited  alternate  "test  forms"  through  the  use  of  an  automated 
configuration  file  system.  Once  constructed,  the  configuration  file  will  allow  the  researcher  to 
present  the  same  or  different  task  battery  sequences  automatically  for  any  given  set  of  project- 
specific  needs. 

The  WRAIR  PAB  includes  the  following  tasks:  Encoding/Decoding,  2-Letter  and  6-Letter 
Visual  Search,  2-Column  Addition,  Logical  Reasoning,  Digit  Recall,  Serial  Addition/Subtraction, 
Pattern  Recognition,  Wilkinson  Serial  Reaction  Time,  Choice  Reaction  Time,  Time  Wall,  Interval 
Production,  Manikin,  Stroop,  Code  Substitution,  Matching  to  Sample,  Delayed  Recall  and  a 
number  of  self-assessments  of  physical  and  mental  states.  More  detailed  descriptions  of  the 
WRAIR  PAB  tasks  used  in  this  project  are  provided  in  Section  4.3. 

2.3  Need  for  a  UTC-PAB  Normative  Database  Study 

The  efforts  by  OMPAT  outlined  above  have  resulted  in  important  advances  both  in  developing 
standardized  human  performance  task  batteries  and  in  communicating  the  results  of  task  battery 
research.  The  STRES,  CTS,  and  WRAIR  PAB's  have  each  provided  unique  solutions  to  major 
assessment  problems,  and  the  establishment  of  the  Performance  Information  Management  System 
has  provided  the  mechanism  for  more  effectively  and  efficiently  sharing  task  battery  databases. 

The  study  reported  here  focused  on  basic  research  questions  regarding  issues  in  task  battery 
administration  and  use.  Prior  to  this  effort,  very  little  normative  data  existed  for  some  of  these 
batteries,  such  as  the  new  STRES  Battery.  Even  in  cases  where  some  data  existed,  such  as  the 
CTS  database  (Schlegel  and  Gilliland,  1990),  these  data  needed  to  be  updated  and  compared  to 
data  from  new  versions  of  the  batteries.  Also  needed  was  a  considerable  amount  of  basic  research 
regarding  the  reliability  and  validity  of  the  task  batteries,  the  degree  to  which  these  batteries  relate 
to  one  another,  and  additional  explorations  of  variables  that  may  affect  task  battery  performance 
more  globally.  The  present  study  was  designed  to  simultaneously  address  a  number  of  these 
issues.  The  following  section  outlines  in  detail  the  nature  of  this  research  effort. 
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3.0  ESTABLISHING  A  UTC-PAB  NORMATIVE  DATABASE 

In  understanding  any  large-scale,  multifaceted  project  of  this  nature,  it  is  usually  helpful  to 
first  gain  a  global  perspective  of  the  project,  which  then  aids  in  a  more  complete  comprehension  of 
the  specific  research  goals.  So  that  the  reader  can  more  easily  assimilate  the  more  complex  details 
of  the  design  and  methods  found  in  Section  4.0,  this  section  of  the  report  begins  with  an  overview 
of  the  origins  of  the  research  project,  as  well  as  the  rationale  and  context  of  the  project  Following 
this  introduction  are  more  detailed  discussions  of  the  purposes  and  goals  of  the  project 

This  project  represents  a  large  comprehensive  research  effort  drawing  upon  the  collaborative 
input  of  numerous  individuals  across  a  number  of  research  laboratories.  In  its  earliest  stages,  the 
project  was  conceived  to  address  a  fairly  specific  need,  namely  the  development  of  a  normative 
database  for  the  UTC-PAB.  As  the  project  evolved,  it  was  expanded  to  accommodate  several  basic 
research  questions  that  were  generated  by  the  principle  investigators  and  by  researchers  at  OMPAT 
and  the  Performance  Assessment  and  Interface  Technology  Branch  at  Armstrong  Laboratory, 
Wright-Patterson  AFB. 

The  present  project  began  in  response  to  the  need  for  a  better  understanding  of  several  tasks  in 
the  UTC-PAB,  namely  the  subset  comprising  the  STRES  battery.  The  first  version  of  the  STRES 
Battery  was  completed  in  mid  1990  (see  Reeves,  Winter,  LaCour,  Winter,  Vogel,  and  Grissett, 
1990)  and  was  used  with  reasonable  success  for  various  pilot  research  projects.1  However,  as  is 
common  with  the  first  versions  of  test  batteries,  this  early  use  of  the  STRES  Battery  suggested  the 
need  for  a  number  of  modifications  prior  to  widespread  release.  Also,  little  was  known  regarding 
the  data  one  might  expect  from  the  STRES  Battery.  Certainly  these  tasks  had  been  used  before  in 
various  laboratory  settings,  but  how  subjects  would  respond  to  them  as  implemented  in  the  STRES 
Battery  was  unknown.  This  is  not  an  uncommon  problem.  Few  of  the  available  task  batteries 
have  well-defined  normative  databases.  Another  OMPAT  supported  battery,  the  CTS,  is  an 
obvious  exception  (see  Schlegel  and  Gilliland,  1990).  However,  even  the  CTS  has  undergone  a 
revision  since  this  early  database  was  developed.  Thus,  one  very  basic  need  was  for  the 
development  of  a  normative  database,  preferably  one  that  could  be  used  not  only  to  determine  how 
subjects  respond  on  tasks  implemented  in  a  specific  battery,  but  also  to  provide  some  capacity  to 
cross-reference  to  similar  tasks  in  other  batteries.  For  this  purpose,  the  present  project  focused  on 
three  batteries  that  are  closely  associated  with  OMPAT  efforts  in  battery  standardization  and 
development:  the  STRES  Battery,  the  CTS,  and  the  WRAIR  PAB. 


1  The  authors  would  like  to  acknowledge  the  leadership  role  of  Dr.  Frederick  W.  Hegge  and  Dr.  Dennis  L.  Reeves  in 
the  development  of  the  UTC-PAB  AGARD  STRES  battery,  and  the  skilled  programming  contributions  made  by 
Ms.  Kathy  M.  Winter,  Mr.  Sam  J.  LaCour,  and  Ms.  Kathy  Raynsford. 
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At  this  point,  a  number  of  other  basic  research  questions  began  to  emerge  for  human 
performance  battery  researchers.  Clearly,  there  were  attempts  to  increase  the  standardization  of 
task  batteries.  Both  the  efforts  of  OMPAT  (i.e.,  UTC-PAB)  and  those  of  the  AGARD  Working 
Group  were  vivid  examples.  However,  even  with  concerted  efforts  such  as  these  to  standardize 
both  the  nature  of  the  tasks  in  a  battery  and  the  administration  procedures,  there  remains  the  simple 
problem  that  numerous  batteries  exist  -  batteries  that  may  share  the  same  types  of  tasks,  but  may 
still  differ  in  subtle  yet  important  ways.  Therefore,  one  question  that  must  be  addressed  is  whether 
versions  of  the  same  task  implemented  within  two  different  batteries  share  similar  response 
characteristics. 

Based  on  extensive  past  experience  with  the  Criterion  Task  Set  (Schlegel  and  Gilliland,  1990) 
and  more  recent  experience  with  the  UTC-PAB/AGARD  STRES  battery  (Baird,  Kasbaum,  and 
Schlegel,  1990),  other  basic  problems  have  been  identified  as  logical  candidates  for  investigation. 
First,  the  degree  to  which  deadline  conditions  affect  the  nature,  speed,  and  distribution  of  subject 
responses  is  an  important  problem.  There  are  similar  concerns  regarding  performance  differences 
as  a  function  of  trial  length.  Another  issue  concerns  the  sequence  in  which  tasks  are  presented  to 
subjects  and  whether  any  carryover  effects  (either  learning  or  fatigue)  affect  performance. 
Numerous  additional  questions  arise  regarding  the  reliability  of  performance  on  the  selected  tasks. 
For  these  reasons,  deadline  conditions,  trial  length,  and  task  sequence,  along  with  reliability  were 
among  the  major  foci  of  the  current  project. 

Finally,  research  on  task  performance  can  become  costly  both  in  terms  of  time  and  money 
because  human  performance  studies  such  as  these  require  subjects  who  are  well-trained  on  the 
tasks.  The  cost  of  this  training  time  adds  to  each  individual  study.  One  technique  explored  in  this 
project  is  the  establishment  of  a  pool  of  subjects  who,  through  their  involvement  in  this  initial 
study,  are  all  well-trained  on  the  various  task  batteries.  By  maintaining  this  pool  of  experienced 
subjects,  selective  future  studies  can  be  performed  more  economically. 

As  outlined  above,  this  project  was  a  comprehensive  research  effort  aimed  at  providing  several 
research  groups  with  data  of  mutual  interest.  These  data  range  from  fairly  basic  normative  data 
through  fundamental  reliability  and  validity  data  to  more  theory-driven  experimental  data.  Each  of 
the  specific  research  goals  of  this  project  are  presented  in  more  detail  below. 

3.1  Normative  Database  Development 

Computer  implemented  human  performance  tasks,  such  as  those  developed  by  OMPAT,  are 
analogues  of  everyday  tasks  or  contain  the  essential  components  of  everyday  tasks.  They  are  tools 
used  frequently  by  behavioral  and  medical  researchers  for  personnel  selection  or  for  experimental 


10 


evaluation.  In  personnel  selection,  standardized  batteries  can  be  used  to  screen  large  numbers  of 
applicants  for  selected  abilities  (e.g.,  pilot  selection).  The  typical  use  of  these  tasks  for 
experimental  evaluations  employs  one  or  more  tasks  administered  both  under  a  control  or  baseline 
condition  and  under  a  treatment  or  experimental  condition.  Experimental  conditions  typically 
include  factors  that  have  the  potential  for  influencing  operator  performance  such  as  external  factors 
(drugs,  workload,  time  pressure,  time  of  day),  internal  factors  (fatigue,  motivation,  effort),  and 
environmental  stressors  (heat,  noise,  vibration).  These  types  of  controlled  experiments  provide 
useful,  cost-effective  ways  to  assess  the  risks  associated  with  numerous  factors  in  the  work 
environment. 

However,  the  usefulness  of  task  batteries  for  such  personnel  and  experimental  purposes  is 
largely  dependent  on  the  quality  of  the  tasks  and  the  degree  to  which  the  response  characteristics  of 
the  tasks  are  known.  In  order  to  enhance  the  investigator's  ability  to  provide  accurate  placement  or 
isolate  treatment  effects,  it  is  necessary  to  know  precisely  how  people  perform  on  the  tasks  under 
laboratory  baseline  conditions.  With  the  aid  of  a  carefully  developed  normative  database,  other 
investigators  can  thoughtfully  structure  their  testing  conditions  to  better  replicate  those  conditions 
used  to  generate  the  database.  They  can  also  determine  whether  their  baseline  data  are  within 
reasonable  expectations,  thereby  "calibrating"  their  use  of  the  task  battery  and  providing 
reassurance  that  both  their  equipment  and  subjects  are  performing  effectively.  In  addition,  there 
are  certain  situations  where  pretest  baseline  data  are  difficult  or  impossible  to  collect.  In  such 
situations,  the  pre-existing  database  can  serve  as  normative  data  to  which  experimental  data  can  be 
compared. 

Thus,  one  of  the  major  goals  of  this  project  was  to  initiate  the  development  of  a  normative 
database  for  UTC-PAB  task  batteries,  especially  the  newest  battery  (the  STRES  Battery),  and 
thereby  provide  some  crucial  data  to  support  the  OMPAT  performance  database  clearinghouse, 
known  as  the  Performance  Information  Management  System  (PIMS).  To  accomplish  this  goal, 
baseline  data  were  collected  on  a  number  of  tasks  selected  from  the  STRES  Battery,  the  CTS,  and 
the  WRAIR  PAB  (Section  4.3). 

3.2  Reliability  of  UTC-PAB  Measures 

This  project  focused  upon  a  number  of  unknown  characteristics  of  UTC-  PAB  performance. 
Central  among  these  unknown  characteristics  is  the  reliability  of  the  UTC-PAB  tasks.  Reliability 
generally  refers  to  the  degree  to  which  inherent  measurement  error  in  a  test  is  reduced,  thereby 
rendering  the  specific  measurement  repeatable  (Guilford,  1954;  Nunnally,  1967).  Thus,  reliability 
addresses  the  degree  to  which  a  measurement  is  consistent,  especially  over  time. 
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Traditionally,  there  have  been  three  general  approaches  to  the  measurement  of  reliability:  split- 
half,  test-retest,  and  alternate  form  (Guilford,  1954),  Each  of  these  approaches  provides  distinctly 
different  information  about  the  repeatability  of  a  measure.  Split-half  reliability  techniques  generally 
address  issues  related  to  the  internal  consistency  of  a  test  and  provide  little  value  for  highly 
structured  and  repetitive  human  performance  tasks.  For  the  purposes  of  this  project,  split-half 
reliability  is  probably  of  least  importance. 

Test-rctest  reliability  techniques  are  primarily  concerned  with  the  stability  of  a  measure  over 
time.  In  this  approach,  measurements  are  performed  on  two  (or  more)  occasions  and  compared. 
High  positive  correlations  between  the  measurements  suggest  that  the  psychological  functions  or 
abilities  that  are  measured  remained  stable  during  the  two  (or  more)  testing  sessions.  Test-retest 
techniques  thus  provide  a  fairly  convenient  method  for  assessing  the  repeatability  of  a  measure. 

There  are  some  problems  in  test-retest  reliability  assessment  however.  The  stability  of  a 
measure  can  be  affected  by  factors  that  would  normally  contribute  to  ereor  variance.  These  factors 
would  include  the  subject's  health,  fatigue,  boredom,  emotions,  and  other  environmental  factors 
such  as  temperature,  lighting,  humidity,  etc.  Also,  the  experience  of  the  subject  during  the  first 
exposure  to  the  test  and  any  learning  during  the  interim  period  can  change  the  nature  of  the 
subject's  response  strategy  on  the  second  testing.  The  greater  the  delay  between  testing  periods, 
the  greater  the  potential  for  these  conditions  to  affect  the  test  score. 

Alternate  form  techniques  have  features  of  both  split-half  and  test-retest  approaches.  These 
techniques  incorporate  at  least  two  versions  of  the  measure,  versions  that  are  assumed  to  have 
equal  means  and  variances.  When  administration  of  both  tests  can  be  close  together  in  time,  the 
resulting  data  reflect  both  equivalence  in  test  content  and  stability  in  performance.  As  the  test-retest 
interval  increases,  the  alternate  form  technique  becomes  vulnerable  to  the  same  problems  affecting 
test-retest  approaches,  namely  fluctuations  in  internal  or  external  variables  that  affect  test  outcome. 
The  alternate  form  technique  thus  can  provide  information  about  the  equivalence  of  the 
"psychological-measurement  content"  across  the  test  instruments,  as  well  as  information  about 
stability  (Guilford,  1954;  see  also  Gulliksen,  1950;  Thorndike,  1951). 

Human  performance  testing  presents  some  especially  difficult  problems  in  reliability 
assessment.  First,  while  generally  simple  in  design,  many  of  the  tasks  utilized  in  human 
performance  testing  require  some  degree  of  practice  to  develop  proficiency.  Thus,  comparison  of 
initial  trials  with  subsequent  retest  trials  is  usually  subject  to  considerable  influence  by  the  f  etors 
mentioned  above  that  can  moderate  test-retest  reliability.  As  a  result,  initial  trials  often  serve  as 
practice  trials  to  overcome  "learning  effects"  and  to  provide  more  stable  "baseline"  performance  in 
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later  trials.  Unfortunately,  the  amount  of  training  is  often  not  standardized  and,  in  some  cases, 
training  is  not  sufficient  or  not  feasible  due  to  methodological  considerations.  Even  after  sufficient 
training,  it  is  often  the  case  that  the  interval  between  baseline  testing  sessions  is  sufficiently  long 
that  fluctuations  in  internal  and  external  factors  may  be  large  enough  to  affect  reliability  estimates. 
For  many  tasks,  there  appears  to  be  a  continuous  decline  in  reliability  with  increasing  time  between 
testing  sessions.  In  fact,  Guilford  (1956)  notes  that  some  psychomotor  tests  may  yield  split-half 
reliabilities  of  0.90  to  0.95,  while  one-year  test-retest  reliabilities  only  reach  approximately  0.70. 

This  study  was  designed  to  provide  test-retest  reliability  data  over  several  time  intervals. 
These  include  30  minutes,  24  hours,  1  v/eek  (5  days),  and  3  weeks  (19  days). 

3.3  Training  Requirements  for  UTC-PAB 

This  study  provided  the  opportunity  to  partially  examine  the  learning  rate  and  the  training 
requirements  for  a  variety  of  tasks.  The  STRES  Battery  tasks  were  administered  ten  times  during 
the  training  sessions  and  the  CTS  and  WRAIR  PAB  tasks  were  administered  five  times  each. 
Some  limited  conclusions  can  be  drawn  from  these  data.  However,  the  reader  should  keep  in  mind 
that  the  gross  similarity  between  versions  of  the  same  task  on  both  the  CTS  and  STRES  Batteries 
has  the  effect  of  potentially  compounding  the  training  effect.  In  other  words,  the  training  a  subject 
received  on  the  CTS  tasks  could  have  easily  been  transferred  to  the  similar  STRES  Battery  tasks, 
and  vice  versa.  Nonetheless,  the  data  from  the  tiaini/ig  sessions  of  this  study  should  provide 
valuable  information  about  the  training  requirements  of  these  batteries. 

3.4  Comparison  of  Similar  Tasks  Across  Batteries 

Many  of  these  task  batteries  share  common  tasks.  In  fact,  the  CTS  and  STRES  Battei.es 
share  no  fewer  than  five  tasks,  almost  identical  in  nature.  These  tasks  include:  Mathematical 
Processing,  Memory  Search  (Sternberg),  Spatial  Processing,  Grammatical  Reasoning,  and 
Unstable  Tracking.  Even  though  these  tasks  are  nearly  (or  even  completely)  conceptually  identical, 
they  often  bear  subtle  yet  crucial  differences  as  a  result  of  the  manner  in  which  they  are 
programmed  and  presented  visually.  Additional  differences  in  the  way  subjects  respond  to  these 
tasks  may  be  related  to  different  modes  of  subject  response,  such  as  keyboard  versus  special 
response  keypad.  One  simple,  yet  important,  question  this  study  was  designed  to  answer  was 
how  these  various  versions  of  the  same  task,  implemented  in  different  batteries,  compared  to  one 
another.  Of  particular  concern  were  the  five  tasks  shared  by  the  CTS  and  STRES  Batteries. 
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3.5  Comparison  of  Group  vs.  Individual  Testing 

In  any  human  performance  training  situation,  especially  one  leading  to  a  large  normative 
database,  a  number  of  training  factors  might  influence  the  nature  of  the  data  collected.  One  factor 
of  importance  is  whether  the  collection  of  training  and  baseline  data  was  conducted  under  group  or 
individual  conditions. 

Group  training  and  baseline  data  collection  generally  involves  greater  initial  cost  in  terms  of 
acquiring  multiple  training/data  collection  stations.  However,  this  cost  is  often  outweighed  by  the 
ability  to  simultaneously  train  and  collect  data  on  larger  numbers  of  subjects  more  efficiently.  By 
contrast,  training  and  collecting  data  from  subjects  individually  requires  less  of  an  initial  facilities 
investment,  but  requires  a  considerable  investment  in  staff  resources  and  time. 

Typically,  the  decision  to  adopt  one  approach  or  the  other  is  based  on  available  equipment, 
laboratory  resources,  staff  resources,  and  the  time  available  to  complete  the  project. 
Unfortunately,  these  factors  do  not  address  perhaps  the  most  important  question.  Specifically, 
what  is  the  effect  on  the  subject's  performance  as  a  result  of  being  trained  and  assessed  in  groups, 
as  opposed  to  individual  training/testing  conditions?  Group  versus  individual  testing  dynamics 
have  been  investigated.  It  is  well-known  that  groups  can  have  significant  influences  on  the  actions 
and  attitudes  of  individuals  (Asch,  1951,  1956;  Myers,  1962),  and  vice  versa  (McGrath,  1962). 
In  fact,  the  simple  presence  of  others  can  have  "social  facilitation"  effects  that  can  increase  or 
decrease  performance  (Zajonc,  1965;  see  also  Bond  and  Titus,  1983;  Guerin,  1986). 

To  address  this  question,  portions  of  this  study  were  conducted  utilizing  group  and  individual 
testing  protocols.  The  data  collected  during  the  individual  training/testing  sessions  included  only 
training  and  baseline  testing  (i.e.,  no  retest  reliability,  deadline,  or  trial  length  investigations  were 
conducted).  These  data  provide  the  opportunity  to  assess  the  influence  of  group  versus  individual 
training  and  testing  conditions  on  performance  of  the  target  task  batteries. 

3.6  Effects  of  Task  Order  and  Battery  Sequence 

The  potential  influence  of  the  order  in  which  experimental  treatments  (or  tasks)  are  presented 
is  well-known  in  the  experimental  psychology  literature,  as  are  the  methods  for  addressing  the 
problem  methodologically  (Myers,  1980;  Underwood  and  Shaughnessy,  1975).  How  this 
phenomenon  influences  task  battery  performance  is  unclear.  Presumably,  order  effects  are  just  as 
potentially  threatening  in  task  battery  research  as  they  are  in  other  areas  of  behavioral 
experimentation.  However,  there  are  cases  in  human  performance  research  where  randomizing 
task  presentation  order  may  be  difficult  or  impossible.  In  fact,  the  AGARD  STRES  Battery 
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guidelines  specify  a  fixed  sequence  in  which  the  tasks  should  be  presented,  but  it  is  not  known  to 
what  extent  performance  may  be  affected  by  this  sequence. 

Very  little  research  has  been  conducted  that  assesses  this  problem  in  the  area  of  task  battery 
research,  probably  because  most  researchers  just  assume  it  exists  and  routinely  counterbalance  task 
presentation  order  to  control  for  it.  However,  a  follow-on  analysis  of  the  Schlegel  and  Gilliland 
(1990)  investigation  of  CTS  performance  suggests  that  counterbalancing  may  not  always  be 
necessary.  In  this  analysis,  the  authors  compared  the  responses  of  subjects  who  were  presented 
the  CTS  in  a  different  order  than  that  presented  to  the  subjects  in  the  original  study.  The  results  of 
this  analysis  revealed  no  significant  differences  between  the  two  groups  on  any  of  the  major 
dependent  measures  of  the  task  battery.  While  random  presentation  of  tasks  is  generally  a  prudent 
strategy,  it  is  also  important  to  know  when  order  effects  are  a  problem  in  task  battery 
administration  and,  //order  effects  are  a  problem,  to  what  degree.  This  project  was  designed  to 
assess  the  influence  of  various  task  presentation  orders  on  performance.  In  addition,  the  study 
was  also  designed  to  assess  the  order  of  task  battery  presentation  on  performance. 

3.7  Effects  of  Imposing  Response  Deadlines 

The  current  tasks  in  the  UTC-PAB  are  essentially  self-paced,  with  no  appreciable  response 
deadlines.  Version  1.0  of  the  CTS  imposed  response  deadlines  for  most  tasks.  Data  from  pilot 
studies  prior  to  two  major  data  collection  efforts  (Schlegel  and  Shingledecker,  1985;  Schlegel  and 
Gilliland,  1990)  pointed  to  the  fact  that  some  deadlines  were  very  strict  and  resulted  in  subject 
response  failures  on  an  unusually  high  number  of  trials.  For  other  tasks,  the  deadlines  provided 
little  or  no  incentive  for  faster  responses.  As  a  result,  the  experimental  testing  reported  in  Schlegel 
and  Gilliland  (1990)  was  conducted  using  the  Training  option  of  the  tasks.  This  option  provided 
15-second  deadlines  for  ail  discrete  response  tasks.  Based  on  this  decision  and  the  data  from 
Schlegel  and  Gilliland  (1990),  CTS  Version  2.0  uses  modified  deadlines.  Table  1  compares  the 
response  deadlines  for  Versions  1.0  and  2.0  of  the  CTS  and  for  the  UTC-PAB/AGARD  STRES. 

Table  1.  Response  Deadlines  (seconds)  for  CTS  and  STRES  Tasks. 


Task 

CTS  V1.0 

CTS  V2.0 

STRES 

Grammatical  Reasoning 

6.5 

15.0 

15.0 

Memory  Search 

2.0 

3.0 

5.0 

Mathematical  Processing 

3.0 

15.0 

15.0 

Spatial  Processing 

2.5 

15.0 

15.0 

15 


There  is  no  doubt  that  the  imposition  of  a  response  deadline  affects  a  subject's  reaction  time 
and  the  percentage  of  response  failures  as  a  function  of  the  suit  mess  of  the  deadline.  An  important 
question  is  how  much  the  subject's  actual  response  strategy,  as  reflected  by  the  frequency 
distribution  of  the  reaction  times,  is  affected. 

A  recent  pilot  study  involving  the  Memory  Search,  Grammatical  Reasoning,  and  Spatial 
Processing  tasks  from  the  UTC-PAB/AGARD  STRES  battery  confirmed  that  a  response  deadline 
typically  reduces  the  mean  reaction  time  for  a  session.  A  more  important  result  is  the  fact  that  the 
standard  deviation  of  reaction  times  for  correct  responses  is  significantly  reduced.  This  result 
holds  even  for  moderate  deadlines  for  which  the  percentage  of  response  failures  does  not 
appreciably  increase. 

The  presence  of  a  response  deadline  appears  to  motivate  a  subject  to  respond  faster  while 
maintaining  an  acceptable  level  of  accuracy.  Deadlines  that  are  too  strict  place  an  added  time 
pressure  (stressor)  on  the  subject  and  may  unduly  impair  performance.  The  importance  of  this 
effect  on  subject  performance  measures  for  the  discrete  response  STRES  tasks  was  investigated  so 
that  reasonable  deadlines  may  be  established,  especially  if  they  are  helpful  in  reducing  performance 
variability  and  motivating  subjects  without  providing  additional  stress. 

3.8  Effects  of  Extended  Trial  Length 

The  UTC-PAB  and  related  task  batteries  have  relatively  fixed  trial  lengths  and  these  trial 
lengths  are  generally  short  (e.g.,  three  minutes).  While  short  trial  lengths  have  several  advantages, 
especially  ease  of  administration  and  efficiency,  they  may  also  represent  a  major  problem. 
Specifically,  it  is  unclear  whether  such  a  short  testing  epoch  reasonably  represents  general 
performance.  One  short  epoch,  even  after  practice  and  baseline  trials,  may  simply  not  be  sufficient 
to  capture  the  nature  of  more  generalized  trends  in  performance.  Even  a  small  sample  of  short 
epochs  may  be  insufficient,  especially  if  there  have  been  temporary  fluctuations  in  external 
variables  or  abilities  as  mentioned  above. 

One  factor  that  may  distort  the  accurate  view  of  treatment  effects  on  performance  during  short 
trial  lengths  is  the  temporary  recruitment  of  abilities  that  takes  place  upon  initiation  or  change  in 
workload.  Borrowing  from  Selye's  (1976)  now  famous  physiological  theory  of  stress  adaptation, 
when  faced  with  increased  stress,  organisms  enact  compensatory  mechanisms  (i.e.,  recruitment 
processes)  to  cope  with  the  increased  demand.  The  organism  resists  (copes)  for  some  period  and 
then  collapses  at  rates  related  to  the  magnitude  of  the  stressor.  Recent  work  in  such  areas  as 
selective  attention,  work  strategy,  dual-task  paradigms,  and  the  multiple  resources  models  of 
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cognitive  processing  suggests  that  similar  recruitment  processes  operate  at  the  cognitive  (as  well  as 
the  physiological)  level. 

It  is  conceivable  that  short  trial  lengths  may  be  providing  data  during  that  period  in  which  the 
person  is  making  a  special  effort  to  recruit  resources  to  perform  the  task.  Thus,  it  is  possible  that  a 
subject  could  recruit  resources  under  the  most  trying  circumstances  to  perform  well  for  three 
minutes.  Additionally,  the  initial  moments  of  task  performance  (even  after  practice)  could  be  the 
most  unstable.  Extending  the  trial  length  or  sampling  data  beyond  a  set  period  of  on-task  time  may 
allow  for  sampling  during  more  stable  periods  of  performance,  or  at  least  periods  more  reflective 
of  general  performance  levels. 

This  line  of  logic  has  serious  implications  for  the  issue  of  task  sensitivity,  that  is,  the 
capability  of  the  task  to  adequately  measure  treatment  effects.  If  cognitive  resource  recruitment 
does  take  place,  and  if  it  takes  place  in  a  compensatory  form  in  the  first  few  minutes  of  task 
performance  or  following  a  stressor,  then  extending  trial  length  would  allow  sampling  of 
performance  under  conditions  of  generalized  resistance  (or  acclimation)  rather  than  during 
compensatory  recruitment.  The  opening  minutes  of  task  performance,  protected  by  recruitment 
processes,  would  provide  the  maximum  ability  to  resist  the  effects  of  not  only  workload  onset,  but 
also  external  variables  of  interest  such  as  environmental  stressors  and  drug  effects.  Longer  time  on 
task  would  provide  the  opportunity  to  pass  through  the  initial  recruitment  phase  and  assess  the 
influence  of  variables  on  performance  under  conditions  during  resistance,  acclimation,  or 
exhaustion. 

Some  recent  pilot  data  from  the  authors'  laboratory  suggest  that  trial  length  may  be  a 
significant  task  parameter.  Subjects  were  given  practice  trials  on  the  CTS  Memory  Search  task  and 
then,  in  counterbalanced  order,  a  3-minute  trial  and  a  21 -minute  trial  (actually  7, 3-minute  trials  in 
rapid  succession).  The  results  indicated  that  depending  on  the  epoch  of  the  21 -minute  period, 
subject  data  varied  greatly  in  comparison  to  the  3-minuie  data.  While  these  data  are  very 
preliminary,  they  do  suggest  that  an  increased  understanding  of  the  effect  of  trial  length  is 
important,  especially  with  regard  to  its  implications  for  task  sensitivity. 

In  this  study,  trial  length  was  examined  by  repeating  the  administration  of  the  standard 
3-minute  trials  of  the  same  task.  Performance  during  6-minute,  12-minute,  and  24-minute  testing 
sessions  was  examined  and  compared  to  data  from  the  standard  trial  length. 
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3.9  Usefulness  of  Psychometric  State  Measures 

There  have  been  numerous  attempts  to  measure  psychological  states  psychometrically.  In 
many  cases  these  attempts  have  been  useful  for  assessing  the  influence  of  a  variety  of 
psychological  and  physiological  stressors.  For  example,  the  developers  of  the  WRAIR  PAB  have 
included  psychometric  scales  to  assess  various  psychological  states,  especially  in  relation  to  drug 
and  disease  effects. 

For  the  purposes  of  this  study,  both  the  Mood  Scale  II  and  the  Stanford  Sleepiness  Scale  from 
the  WRAIR  PAB  were  included  in  the  testing  protocol.  It  was  believed  that  aside  from  the  value  of 
developing  some  normative  data  for  these  scales,  they  might  be  useful  as  convergent  measurements 
of  task  demand  or  other  factors  such  as  cumulative  workload  demand  across  the  test  session. 
Thus,  the  use  of  these  scales  was  an  exploratory  attempt  to  confirm  or  identify  a  relationship 
between  performance  changes  or  differences  and  concomitant  changes  in  mood. 

3.10  Software  Analysis/Evaluation 

The  task  batteries  utilized  in  this  project  varied  in  the  degree  to  which  they  were  "field 
proven."  During  the  course  of  this  project,  a  number  of  issues  or  problems  related  to  software, 
hardware,  and  such  matters  as  instructional  sets  and  data  management  were  identified.  Feedback 
was  provided  to  the  developers  of  the  individual  batteries.  This  feedback  has  resulted  in 
corrections  and  modifications  incoiporated  in  updated  versions  of  the  various  batteries. 


4.0  PROJECT  DESIGN  AND  METHOD 
4.1  Project  Design 

As  noted  in  the  previous  section,  this  was  a  comprehensive  research  project  aimed  at 
addressing  a  wide  range  of  research  needs  from  fairly  basic  normative  data  through  fundamental 
reliability  and  validity  data  to  theory-driven  experimental  data.  To  accomplish  such  a  research 
effort,  the  project  was  designed  so  as  to  provide  a  high  degree  of  structure,  yet  provide  the 
flexibility  to  explore  basic  research  questions.  Thus,  by  its  nature,  the  project  required  a  complex 
design  incorporating  tradeoffs  between  competing  research  needs.  For  example,  to  ensure  enough 
subjects  for  a  stable  normative  data  base,  training  on  all  tasks  (across  batteries)  had  to  occur 
simultaneously.  While  this  may  raise  some  questions  about  skill  transfer  between  the  batteries 
during  training,  it  was  deemed  that  the  baseline  data  were  of  more  importance  and  that  some 
contamination  of  the  training  data  was  acceptable  as  a  tradeoff.  Any  serious  limitations  of  the  data 
set  will  be  noted  in  the  results  section  of  this  report 

Figure  1  presents  an  overview  of  the  design  and  testing  protocol  for  this  project.  Data  for  this 
project  were  collected  under  the  direction  of  two  research  teams:  (1)  Dr.  Robert  E.  Schlegel  and 
Dr.  Kirby  Gilliland  at  the  University  of  Oklahoma,  and  (2)  Mr.  Gary  F.  Reid  and  Mr.  Mark  S. 
Crabtree  at  Armstrong  Laboratory,  Wright-Patlerson  AFB.  As  noted  in  Figure  1,  Orientation, 
Training,  and  Baseline  testing  (Week  2,  Days  1  and  2)  for  establishing  the  UTC-PAB  normative 
database  and  for  assessing  group  vs.  individual  testing  effects  was  conducted  for  all  subjects  at 
both  locations.  This  phase  of  the  project  involved  an  orientation  session  followed  by  five  days  of 
trrining  sessions  (labeled  '  T"  on  Figure  1),  plus  two  days  of  baseline  sessions  (B).  All  subjects 
completed  the  selected  STRES,  CTS,  and  WRA1R  PAB  tasks.  More  details  regarding  orientation 
and  specific  data  collection  procedures  will  be  provided  in  the  Experimental  Procedure  Section  (see 
Section  4.6).  Data  addressing  additional  research  questions  such  as  one-week  and  three-week 
reliability  (R3,  R4),  deadline  effects  (D)  and  extended  trial  lengths  (E)  were  derived  from 
subsequent  testing  sessions  conducted  only  at  the  University  of  Oklahoma.  These  additional 
sessions  involved  testing  on  all  tasks  for  the  reliability  test  sessions.  Data  collection  for  the 
deadline  and  extended  trial  sessions  was  restricted  to  the  STRES  battery  tasks. 

All  Armstrong  Laboratory  subjects  were  trained  and  tested  individually.  Subjects  at  the 
University  of  Oklahoma  were  tested  in  groups  of  eight.  The  basic  testing  protocol  for  Armstrong 
Laboratory  subjects  lasted  seven  days.  The  protocol  tor  University  of  Oklahoma  subjects  required 
approximately  five  weeks.  Two  complete  cycles  of  the  five-week,  University  of  Oklahoma  testing 
protocol  were  needed  to  acquire  data  from  the  specified  minimum  number  of  subjects. 
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Figure  1.  UTC-PAB  Project  Design. 
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R2  =  one-day  test-retest 
R3  =  one-week  test-retest 
R4  =  three-week  test-retest 


4.2  Subjects 


Subjects  were  recruited  from  the  campuses  of  the  University  of  Oklahoma  and  Wright  State 
University,  and  the  experimental  procedures  were  implemented  under  the  authorization  of  their 
respective  Institutional  Review  Boards  and  in  accordance  with  AFR  169-3.  To  control  for  possible 
performance  variability  due  to  gender,  only  male  subjects  were  selected.  Also,  due  to  the  verbal 
nature  of  many  of  the  tasks  and  the  instructional  sets,  only  native  English  speaking  subjects  were 
recruited.  Subjects  were  screened  for  gross  hearing  and  visual  impairments.  Details  of  the 
screening  process  are  presented  in  the  Subject  Recruitment,  Screening,  and  Orientation  Procedures 
Section  (see  Section  4.6.1).  Table  2  provides  general  summary  information  regarding  the  two 
subject  samples. 


Table  2.  Subject  Characteristics. 


Group  Administration 
(Oklahoma) 

Individual  Administration 

(Armstrong  Lab) 

Number 

(N=) 

64 

15 

Deadline  Study 

(33) 

- 

Extended  Trials 

(31) 

- 

Age 

Mean 

21.0 

21.6 

Std.  Dev. 

3,2 

3.1 

Range 

18-36 

18-27 

Right  Handed 

59  (91%) 

15  (100%) 

Class 

Freshman 

21  (33%) 

6  (37.5%) 

Sophomore 

18  (28%) 

6  (37.5%) 

Junior 

14  (22%) 

0  (0%) 

Senior 

7  (11%) 

4  (25%) 

Graduate 

4  (6%) 

0  (0%) 

GPA 

Mean 

2.82 

2.89 

Std.  Dev. 

0.59 

0.51 

Range 

1.60-3.84 

2.20-4.00 

All  subjects  were  paid  for  their  participation  in  the  project.  Because  this  was  a  multi-session 
experiment,  a  bonus  system  was  used  to  increase  motivation  and  completion  rate.  Subjects  that 
successfully  completed  the  study  were  given  a  bonus  payment.  Armstrong  Laboratory  subjects 
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received  $80.00  for  participating  in  the  initial  two-hour  orientation  and  the  subsequent  seven,  two- 
hour  training  and  baseline  sessions.  These  subjects  were  paid  an  additional  $15.00  for  attending 
an  end-of-study  debriefing  session.  University  of  Oklahoma  subjects  participated  in  five  additional 
two-hour  sessions.  They  were  paid  $4.00  per  hour  plus  a  $1.00  bonus  for  each  hour  if  they 
successfully  completed  the  study.  Their  bonus  was  $24.00  for  a  total  of  $120.00  for  successful 
completion  of  the  study. 

4.3  Task  Selection 

4.3.1  Performance  Tasks 

Because  the  UTC-PAB/AGARD  STRES  battery  was  the  primary  focus  of  this  project,  the 
tasks  included  in  this  battery  took  a  higher  priority  in  selection.  There  was  also  a  definite  interest 
in  including  CTS  tasks  because  the  CTS  is  one  of  the  few  batteries  for  which  there  is  an 
established  database  (Schlegel  and  Gilliland,  1990).  Another  important  question  was  the  degree  of 
similarity  between  tasks  common  to  both  the  STRES  battery  and  the  CTS.  Therefore,  tasks  were 
selected  to  maximize  the  information  gained  regarding  the  STRES  battery  while  at  the  same  time 
affording  a  maximum  level  of  information  on  the  CTS  tasks,  as  well  as  comparative  information 
across  these  batteries. 

Another  UTC-PAB  related  battery  frequently  used  for  screening  and  selection  is  the  WRAIR 
PAB.  While  there  is  less  overall  overlap  between  the  WRAIR  PAB  and  the  batteries  mentioned 
previously,  there  is  some  task  overlap  and  there  are  some  additional  tasks  that  are  unique  to  the 
WRAIR  PAB  and  worthy  of  comparison.  The  tasks  selected  for  inclusion  in  this  project  are  listed 
in  Table  3. 

Five  of  the  tasks  that  were  examined  in  this  study  were  implemented  in  very  similar  versions 
in  the  STRES  and  CTS  batteries.  They  are  (1)  Grammatical  Reasoning,  (2)  Mathematical 
Processing,  (3)  Memory  Search,  (4)  Spatial  Processing,  and  (5)  Unstable  Tracking.  Each  of  these 
tasks  is  described  in  detail  below.  Additional  information  can  be  found  in  Shingledecker  (1984), 
Englund,  Reeves,  Shingledecker,  Thome,  Wilson,  and  Hegge  (1987),  and  AGARD  (1989). 

Grammatical  Reasoning.  This  task  requires  subjects  to  respond  true  or  false  to  a  pair  of 
simple  statements  that  describe  the  ordinal  relationship  of  symbols  (e.g.,  @  #  *).  For 
example,  the  subject  is  presented  the  following: 

@  PRECEDES  # 

#  PRECEDES  * 

@  #  * 
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Table  3.  Task  List. 


Battery  Comparison 

STRES 

CTS 

msm 

Grammatical  Reasoning 

(GRM) 

Grammatical  Reasoning 

(GR) 

Mathematical  Processing 

(MTH) 

Mathematical  Processing 

(MP) 

Sternberg  -  2  Character 

(STN2) 

Sternberg  -  4  Character 

(STN4) 

Memory  Search  -  4  Character 

(MS) 

Spatial  Processing 

(SPA) 

Spatial  Processing 

(SP) 

Unstable  Tracking 

Unstable  Tracking 

(UP 

Supplemental  Tasks 

WRAIR 

STRES 

HU 

Manikin  Task 

(MAN) 

Reaction  Time  (6  Blocks) 

(RCT) 

Time  Wall 

(TIM) 

Dual-Task  Combination 

(CBO) 

Interval  Production 

m!Sm 

Subject  State  Measures’" 

Stanford  Sleepiness  Scale 

(STA) 

Mood  II  Scale 

(MOO) 

*  collected  before  and  after  STRES  battery  runs 

The  subject  determines  whether  the  first  statement  is  true  or  false  by  examining  the  order 
of  the  symbols  on  the  bottom  line.  The  subject  then  determines  whether  the  second  statement 
is  true  or  false.  If  both  statements  are  true,  or  if  both  statements  are  false,  the  subject 
responds  by  pressing  the  "match"  button.  If  one  statement  is  true  and  the  other  is  false,  then 
the  subject  presses  the  "non-match"  button.  In  this  example,  the  subject's  response  would  be 
MATCH. 

There  are  64  possible  statement  variations.  Each  statement  is  presented  once  before  any 
statement  is  repeated  during  a  trial.  The  subject  would  normally  be  required  to  respond  about 
once  every  3  to  5  seconds  during  the  three-minute  trial.  Response  accuracy  and  reaction  time 
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are  recorded  on  disk.  Percent  correct  and  mean  response  time  for  correct  responses  were 
presented  to  the  subject  upon  task  completion. 

Mathematical  Processing.  In  this  task,  simple  problems  involving  multiple  arithmetic 
operations  are  presented,  one  at  a  time,  and  the  subject  calculates  whether  the  solution  is  less 
than  or  greater  than  5.  The  subject  is  instructed  to  respond  by  pressing  a  key  designated  to 
indicate  "less  than"  or  "greater  than."  For  example: 

3+8-2= 


In  this  example,  the  answer  is  9  and  the  subject  would  press  the  key  designated  to 
indicate  "greater  than  5."  The  subject  may  receive  up  to  SO  presentations  during  a  three- 
minute  trial.  Response  accuracy  and  reaction  time  are  recorded  on  disk.  Percent  correct  and 
mean  response  time  for  correct  responses  were  presented  to  the  subject  upon  task  completion. 

Memory  Search  (Sternberg).  In  this  task,  subjects  are  required  to  memorize  a  set  of 
either  two  or  four  letters.  Then,  as  letters  appear  on  the  screen  one  at  a  time,  the  subject 
decides  if  each  letter  appearing  on  the  screen  is  a  member  of  the  memorized  set.  The  subject 
responds  "yes"  or  "no"  using  assigned  keys.  Up  to  100  letters  may  be  presented  during  a 
three-minute  trial.  Response  accuracy  and  reaction  time  are  recorded  on  disk.  Percent  correct 
and  mean  response  time  for  correct  responses  were  presented  to  the  subject  upon  task 
completion. 

Spatial  Processing.  This  task  requires  that  the  subject  view  a  four-bar  column  chart 
(called  the  "target"  stimulus)  for  1  second.  The  bars  are  approximately  .5  cm  wide  with  .5  cm 
spacing  between  them,  and  their  height  varies  from  1.0  to  6.0  cm.  After  the  target  stimulus 
disappears,  a  "comparison"  stimulus  appears  that  is  rotated  either  90  or  270  degrees  from  the 
original  target  position.  The  bar  lengths  of  the  comparison  stimulus  may  be  the  same  or 
different  from  the  target  stimulus.  The  subject  responds  using  keyu  designated  as  "same"  and 
"different."  Fifty  to  sixty  of  these  stimulus  pairs  may  be  presented  in  a  three-minute  trial. 
Response  accuracy  and  reaction  time  are  recorded  on  disk.  Percent  correct  and  mean  response 
time  for  correct  responses  were  presented  to  the  subject  upon  task  completion. 

Unstable  Tracking.  This  task  presents  to  the  subject  a  cursor  moving  horizontally  on  the 
screen.  Depending  on  the  computer  system  being  used,  a  knob  or  a  joystick  is  moved  by  the 
subject  in  order  to  keep  the  cursor  centered  on  the  screen.  This  task  requires  continuous 
subject  control  for  the  duration  of  a  three-minute  trial.  The  subject  inputs  required  in  this  task 
are  similar  to  those  required  by  simple  video  games.  Root  mean  square  (RMS)  tracking  error 
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and  control  losses  are  recorded  on  disk  for  each  second  of  task  performance.  Tracking  RMS 
error  and  total  edge  violations  (control  losses)  were  presented  to  the  subject  upon  task 
completion. 


These  five  tasks  are  popular  laboratory-based  tasks  used  by  psychologists,  human  factors 
specialists,  and  other  behavioral  scientists  to  explore  fundamental  properties  of  human 
performance.  These  tasks  are  commonly  found  in  some  combination  in  a  number  of  task  batteries. 
However,  it  should  be  remembered  that  even  similar  versions  of  such  tasks  within  a  family  of 
batteries,  such  as  the  UTC-PAB  group  of  batteries,  can  have  large  or  subtle  variations  that  may 
lead  to  overall  performance  score  differences.  For  example,  the  STRES  version  of  the  Spatial 
Processing  task  presents  bars  that  are  non-filled  and  narrower  in  their  general  proportion  compared 
to  those  in  the  CTS  version.  It  may  be  the  case  that  these  graphical  features  of  the  column  graph 
stimuli  differentially  affect  performance.  For  this  reason,  this  project  was  designed  to  provide 
comparisons  of  those  tasks  implemer‘ed  in  different  batteries. 

In  addition  to  the  preceding  tasks  that  are  common  primarily  to  the  STRES  and  CTS  batteries, 
the  Reaction  Time  Task  and  the  Memory  Search/Unstable  Tracking  Dual  Task  (COMBO)  from  the 
STRES  Battery  were  included.  Tasks  from  the  WRAIR  PAB  that  were  also  included  were  the:  (1) 
Manikin  task,  (2)  Time  Wall  task,  and  (3)  Interval  Production  task.  The  Mood  Scale  II  and  the 
Stanford  Sleepiness  Scale,  both  from  the  WRAIR  PAB,  were  also  incorporated  in  the  testing 
protocol. 

Reaction  Time.  The  Reaction  Time  task  presents  numbers  from  two  to  five  on  the  left  or 
right  side  of  the  screen.  Some  of  the  numbers  are  degraded  in  appearance,  are  temporally 
unpredictable,  require  multiple  key  presses,  or  require  the  subject  to  "switch  hands"  in  terms 
of  response  mapping.  The  subject  is  instructed  to  press  one  of  two  keys  to  indicate  the  side  of 
presentation  on  the  screen  and  whether  the  number  is  a  two  or  three  versus  a  four  or  five. 
Reaction  time  and  accuracy  are  recorded.  Total  session  length  for  six  different  testing 
conditions  lasts  approximately  15  minutes.  The  word  "error"  was  displayed  on  the  screen 
following  each  incorrect  response.  Percent  correct  and  mean  response  time  were  presented  to 
the  subject  upon  task  completion. 

Manikin  Task.  This  task  presents  to  the  subject  a  male  figure  holding  a  green  square  in  one 
hand  and  a  red  circle  in  the  other  hand.  The  objects  are  not  always  in  the  same  hand.  The 
manikin  may  be  shown  standing  upright  or  upside  down,  and  facing  toward  the  subject  or 
facing  away  from  the  subject.  Encircling  the  manikin  is  either  a  red  circular  border  or  a  green 
square  border.  For  each  stimulus  presentation,  the  border  signifies  which  manikin-held  object 
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is  the  target  figure.  The  subject  must  decide  in  which  hand  the  manikin  is  holding  the  target 
figure,  and  then  press  a  key  to  signify  the  right  or  left  hand.  Sixteen  different  stimulus  figures 
are  possible.  Three  complete  sets  of  the  16  stimuli  (i.e.,  a  total  of  48  stimuli)  constituted  a 
single  trial  lasting  two  to  three  minutes.  Response  accuracy  and  reaction  time  are  recorded. 
Percent  correct  and  mean  response  time  were  presented  to  the  subject  upon  task  completion. 
During  orientation  and  training  days  1  and  2,  each  response  was  followed  by  the  presentation 
of  a  "C"  for  a  correct  response  or  an  "E"  for  an  error.  After  training  day  2,  the  feedback 
following  each  response  was  no  longer  presented. 

Time  Wall.  This  task  presents  the  subject  with  a  small,  red  square  at  the  top  of  the  screen. 
The  square  drops  at  a  constant  rate.  After  traveling  approximately  two-thirds  of  the  distance 
down  the  screen,  the  square  is  obscured  by  a  red  wall.  At  the  bottom  of  the  wall  there  is  an 
apparent  open  space  into  which  the  falling  square  should  eventually  land.  The  subject  presses 
a  key  when  he  or  she  thinks  the  square  has  had  time  to  reach  the  open  space.  Thus,  the 
subject's  rime  prediction  is  the  critical  dependent  measure.  As  soon  as  the  subject  responds,  a 
new  square  appears  at  the  top  of  the  screen  and  the  task  is  repeated.  The  task  lasts  less  than 
two  minutes  for  a  total  of  ten  squares.  The  mean  of  the  estimated  time  intervals  for  the  total 
travel  of  the  square  was  presented  to  the  subject  upon  task  completion. 

Interval  Production.  The  Interval  Production  task  simply  requires  that  the  subject  press  a 
specified  key  at  regular  intervals  of  approximately  one  second.  On  the  screen,  the  subject  sees 
a  circle  with  a  pointer  much  like  a  clock  hand.  When  the  subject  presses  the  response  key,  the 
pointer  advances  l/60th  of  the  circle.  The  trial  lasts  until  60  responses  have  been  made.  The 
duration  between  key  presses  was  recorded.  Upon  completion  of  the  task,  the  subject  was 
presented  with  the  mean  interval  duration. 

The  five  previously  described  tasks  common  to  the  STRES  battery  and  CTS  (plus  the 
combined  dual  task  variation),  together  with  the  four  tasks  described  above,  constituted  the  total 
task  configuration  administered  in  the  training,  baseline,  and  retest  phases  of  this  project.  Each  of 
these  tasks  yields  a  number  of  dependent  measures  such  as  mean  and  standard  deviation  of  the 
response  time  for  correct  and  incorrect  responses,  percent  correct,  etc.  In  fact,  some  of  the  tasks 
yield  far  more  measures  than  can  reasonably  be  evaluated.  For  the  purposes  of  this  project, 
dependent  measures  for  summary  presentation  and  analysis  were  restricted  to  a  selected  group  of 
primary  measures  deemed  most  relevant.  These  dependent  measures  are  listed  in  Table  4.  The 
specific  administration  order  of  the  tasks  will  be  described  below  under  Experimental  Procedure 
(Section  4.6). 
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Table  4.  Response  Measures. 


IHHHSSffllHi 

Code 

Description 

i  STRES/CTS  Discrete  Tasks  i 

"Overall"  for  all  stimuli 

♦xxMNO 

Mean  RT  for  Correct  Responses 

xxSDO 

Std.  Dev.  of  RT  for  Correct  Responses 

♦xxPCO 

Proportion  of  Correct  Responses 

xxSTTMO 

Number  of  Stimuli 

"Positive"  type  stimuli 

xxMNP 

Mean  RT  for  Correct  Responses 

xxSDP 

Std.  Dev.  of  RT  for  Correct  Responses 

xxPCP 

Proportion  of  Correct  Responses 

"Negative"  type  stimuli 

xxMNN 

Mean  RT  for  Correct  Responses 

xxSDN 

Std.  Dev.  of  RT  for  Correct  Responses 

xxPCN 

Proportion  of  Correct  Responses 

STRES/CTS  Unstable  Tracking 

*UTEV 

Number  of  Edge  Violations 

♦UTRMS 

Root  Mean  Square  (RMS)  Error 

STRES  Reaction  Time 

*RTMN 

Mean  RT  for  Correct  Responses 

*RTSD 

Std.  Dev.  of  RT  for  Correct  Responses 

*RTPC 

Proportion  of  Correct  Responses 

WRAIR  Manikin  Task 

♦MANMNCR 

Mean  RT  for  Correct  Responses 

♦MANPC 

Proportion  of  Correct  Responses 

WRAIR  Time  Wall/Interval  Production 

*TIMMN/INTMN 

Mean  Time  Estimate/Interval 

♦TIMSD/INTSD 

Std.  Dev.  of  Time 

♦Primary  dependent  measures  used  in  analyses 
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4.3.2  Subjective  Psychometric  Scales 

In  addition  to  the  performance  tasks,  two  psychometric  scales  of  psychological  state  were 
administered.  Scales  of  this  type  are  often  used  as  simple  dependent  measures,  as  convergent 
measures,  or  as  measures  to  verify  the  validity  of  a  manipulation  within  psychological  research. 
The  two  scales  were  selected  from  the  WRAIR  PAB.  These  two  subjective  tests  were  completed 
by  each  subject  following  each  complete  STRES  battery  administration.  Both  of  these  scales  were 
presented  on  the  computer  monitor.  Subjects  responded  by  simply  pressing  keys  corresponding  to 
the  appropriate  subjective  response.  The  scales  used  were  the  Mood  Scale  II  and  the  Stanford 
Sleepiness  Scale. 

Mood  Scale  n.  The  Mood  Scale  II  is  a  variation  of  the  Profile  of  Mood  States  (POMS-- 
McNair,  Lorr,  and  Droppleman,  1971).  The  Mood  Scale  II  has  36  items  addressing  the 
following  six  factors:  Activity,  Happiness,  Depression,  Anger,  Fatigue,  and  Fear. 

ILLUSTRATION  of  MOOD  SCALE  II 

You  will  be  given  a  list  of  words  that  people  often  use  to  describe  how  they  feel  followed 
by  the  numbers  1  to  3.  These  numbers  represent  the  degree  to  which  each  word  describes 
how  you  feel: 

1  =  "NOT  AT  ALL1' 

2  =  "SOMEWHAT  OR  SLIGHTLY" 

3  =  "MOSTLY  OR  GENERALLY" 

Indicate  how  each  word  applies  to  HOW  YOU  FEEL  NOW,  by  pressing  T,  '2'  or  '3'. 
(The  following  words  were  presented  one  at  a  time,  and  the  subject  responded  following  the 
presentation  of  each  word.) 

MISERABLE,  UNEASY,  INACTIVE,  ENERGETIC,  BLUE,  GROUCHY,  LIVELY, 
GOOD,  MEAN,  ANNOYED,  DEPRESSED,  ALARMED,  INSECURE,  WEARY,  ALERT, 
LAZY,  CONTENTED,  CHEERFUL,  SAD,  DOWNCAST,  SATISFIED,  ANGRY,  LOW, 
AFRAID,  BURNED  UP,  DROWSY,  CALM,  IRRITATED,  JITTERY,  VIGOROUS, 
PLEASED,  ACTIVE,  HAPPY,  STEADY,  HOPELESS,  SLUGGISH 

Stanford  Sleepiness  Scale.  The  Stanford  Sleepiness  Scale  is  a  scale  designed  to  assess 
the  level  of  sleepiness  experienced  by  the  subject  The  subject  simply  responds  to  the  scale  by 
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selecting  the  statement  that  best  describes  the  level  of  sleepiness  at  that  moment 


ILLUVIATION  flOhfi  STANFORD  SLEEPINESS  SCALE 

CHOOSE  ONE  OF  THE  SEVEN  STATEMENTS  BELOW  WHICH  BEST  DESCRIBES 
YOUR  PRESENT  FEELING.  HOW  YOU  FEEL  RIGHT  NOW. 

1.  Feeling  active  and  vital;  alert;  wide  awake. 

2.  Functioning  at  a  high  level,  but  not  at  peak;  able  to  concentrate. 

3.  Relaxed;  awake,  responsive,  bat  not  at  full  alertness. 

4.  A  little  foggy;  let  down;  not  at  peak. 

5.  Foggy;  slowed  down;  beginning  to  lose  interest  in  remaining  awake. 

6.  Sleepy;  woozy;  prefer  to  be  lying  down;  fighting  sleep. 

7.  Almost  in  reverie;  sleep  onset  soon;  losing  struggle  to  remain  awake. 

4.4  Hardware  and  Software  Requirements 

4.4.1  STRES  Battery  Requirements 

Detailed  hardware  requirements  for  the  UTC-PAB/AGART  STRES  battery  are  found  in  the 
manual  for  the  battery  (see  Reeves,  Winter,  LaCour,  Winter,  Vogel,  and  Grissett,  1990).  Briefly, 
the  STRES  battery  requires  an  IBM  AT  or  compatible  computer  running  at  a  minimum  of  8  MHz 
with  640  Kb  RAM.  A  10  Mb  hard  disk  is  also  required  with  at  least  one  5.25  inch  floppy  disk 
drive.  At  least  CGA  compatible  video  is  required  with  a  color  monitor.  The  system  also  requires 
either  a  Systems  Research  Laboratories,  Inc.  (SRL)  LabPak  or  Tecmar/SSI  Labmaster 
multifunction  data  acquisition  board.  An  analog  joystick  with  an  output  voltage  range  of  ±  5.0 
VDC  is  also  required.  The  software  was  written  in  the  C  programming  language  and  compiled  to 
produce  executable  task  modules. 

4.4.2  CTS  Requirements 

The  CTS  battery  is  implemented  on  the  Commodore  64  microcomputer.  The  system 
requirements  include;  (1)  Commodore  64  microcomputer.  (2)  Commodore  1541  disk  drive 
(preferably  two),  (3)  Epyx™  FastLoad™  cartridge  (optional),  (4)  Commodore  1526  printer  (or 
compatible),  (4)  two  Commodore  1702  color  monitors  (or  equivalent),  and  a  custom  keypad  and 
rotary  control  device,  Detailed  information  on  the  CTS  hardware  requirements  can  be  found  in 
Shingledecker  (1984).  The  CTS  software  was  written  in  Commodore  BASIC  with  calls  to  various 
machine  language  routines.  The  BASIC  source  code  was  compiled  to  produce  executable  task 
files. 
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4.4.3  WRAIR  PAB  Requirements 


Hardware  specifications  for  the  WRAIR  PAB  can  be  found  in  the  WRAIR  PAB  manual 
(Thorne,  1990;  see  also  Thome,  Genser,  Sing,  and  Hegge,  1985).  System  requirements  include: 
(1)  at  least  an  IBM  or  IBM  compatible  AT  microprocessor  with  math  co-processor,  (2)  two  360 
Kb  5.25  inch  floppy  drives  or  a  20  Mb  hard  disk  drive,  (3)  640  Kb  RAM,  (4)  IBM  compatible  bus 
with  four  unused  slots,  (5)  two  RS232  serial  ports  and  one  parallel  port,  (6)  EGA  (128  Kb;  640 
X350)  color  graphics,  and  (7)  SRL  LabPak  or  other  multifunction  board.  The  software  was 
written  in  the  interpreted  Microsoft  BASIC  programming  language. 


Table  5.  Hardware  and  Software  Configuration. 


STRES.  WRAIR 

CTS 

Zenith  Z-248  PC 

Commodore  64  Computer 

Math  Co-processor 

2  Commodore  1541  Disk  Drives 

Internal  Hard  Drive  (task  software) 

2  Commodore  1702  Monitors 

360K  Floppy  Drive  (subject  data) 

Four-Button  Response  Keypad 

Zenith  ZVM138  Monitor 

Rotary  Tracking  Controller 

SRL  Labpak  Board 

Oklahoma  -  Bourns  Potentiometer 

Tracking  Joystick 

Armstrong  -  Allen-Bradley  Potentiometer 

(MS4M6676,  OEM  Controls  Inc.) 

Epyx™  Fastload™  Cartridge 

NAMRL  STRES  Version  4.01  (JAN  '91) 

CTS  Version  2.01  A 

WRAIR  PAB  Version  3.42  (MAR  ’90) 

(Version  2.01  modified  for  automatic  task 

GWBASIC  Version  2.18 

sequencing,  file  naming,  and  data  storage) 

4.5  Testing  Facilities 

4.5.1  University  of  Oklahoma  Facilities.  All  testing  of  University  of  Oklahoma 
subjects  was  conducted  in  a  three-room  suite  in  laboratory  space  allocated  to  the  Department  of 
Psychology.  One  room  (approximately  13  ft.  by  20  ft. )  served  as  the  microcomputer  workstation 
site.  Another  room  of  approximately  the  same  size  served  as  a  data  reduction  and  project 
management  office.  The  third  room  served  as  an  auxiliary’  room  for  interviewing,  orientation,  and 
miscellaneous  activities.  All  of  these  rooms  represent  modem  laboratory  space  with  centrally 
controlled  heating  and  air  conditioning.  Temperature  in  the  room  was  maintained  at  approximately 
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68  degrees  Fahrenheit  throughout  the  testing  session.  Lighting  in  the  room  was  modified  through 
the  use  of  three  40W  indirect  incandescent  lighting  fixtures  to  reduce  video  screen  glare. 


Figure  2  presents  the  workstation  configuration  in  the  main  testing  room.  Four  STRES 
testing  stations  were  located  along  one  wall  just  to  the  right  of  the  entrance.  These  testing  stations 
were  approximately  3.0  ft.  wide  and  3.0  ft.  deep.  The  testing  stations  were  divided  by  acoustic 
panels  (3  inches  in  width).  Keyboards  and  controllers  were  placed  on  tables  at  the  testing  stations 
positioned  at  a  height  of  approximately  28  inches-  Monitors  were  placed  on  10-inch  high  shelves 
at  die  back  of  the  table. 

At  the  other  end  of  the  testing  room,  along  the  two  alternate  walls,  were  located  the  two 
WRAIR  and  two  CTS  testing  stations.  The  dimensions  of  the  WRAIR  and  CTS  subject  testing 
stations  were  cf  the  same  approximate  dimensions  as  the  STRES  testing  stations.  Due  to  a  lack  of 
additional  acoustic  separation  panels,  these  pairs  of  subject  testing  stations  were  divided  by  large 
cardboard  panels  and  an  experimenter  control  station  (approximately  3  ft.  wide). 

The  adjoining  data  reduction  and  project  management  room  contained  a  complete  Commodore 
64  system  for  data  reduction,  an  IBM  compatible  microcomputer  for  data  reduction  and  transfer  to 
the  University  IBM  mainframe  computer,  and  a  terminal  for  data  analysis  on  the  mainframe 
computer. 

4.5.2  Armstrong  Laboratory  Facilities 

All  Armstrong  Laboratory  tests  were  conducted  in  a  laboratory  of  the  Psychology  Department 
at  Wright  State  University.  The  room  measured  approximately  14  ft.  by  10  ft  Temperature  in  the 
room  varied  from  62  to  72  degrees  Fahrenheit.  The  room  was  illuminated  by  two,  4  ft.,  ceiling- 
mounted,  40W  fluorescent  fixtures. 

Two  testing  stations  were  located  at  opposite  ends  of  the  room.  No  subject  booths  or 
enclosures  were  used  because  only  one  subject  was  tested  at  a  time.  There  was  one  computer 
system  at  each  subject  station.  One  test  station  consisted  of  a  Zenith  Z-248  with  a  20  Mb  hard 
drive,  a  360  Kb  floppy  drive,  a  Zenith  EGA  card,  a  Zenith  ZVM-138  EGA  color  monitor,  a  math 
co-processor,  DOS  3.10,  and  Zenith  BIOS  3.12.  The  other  station  consisted  of  a  Commodore  64 
computer,  an  Epyx™  Fastload™  cartridge,  two  Commodore  1541  floppy  disk  drives,  a 
Commodore  1702  color  monitor,  and  the  response  keypad  and  tracking  task  controller  normally 
used  with  the  CTS  (Acton  and  Crabtree,  1985).  The  computer  systems  were  located  on  tables  that 
measured  42  inches  long  by  30  inches  deep,  and  were  adjusted  to  a  height  of  28  inches.  Seat 
height  of  the  lightly  padded,  non-rolling,  chairs  was  fixed  at  approximately  19  inches.  The 
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Figure  2.  OU  Test  Facility  Configuration. 
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experimenter  was  seated  in  a  sound-reduction  cubicle  such  that  he  could  observe  subject 
performance. 

4.6  Experimental  Procedure 

4.6.1  Subject  Recruitment,  Screening,  and  Orientation  Procedures 

Subjects  were  recruited  primarily  from  the  undergraduate  academic  community  at  the 
University  of  Oklahoma  and  Wright  State  University.  Following  institutional  review  board 
approval  of  the  project  on  each  respective  campus,  the  experimenters  posted  recruitment 
announcements  and  disseminated  information  about  the  project.  Students  who  indicated  an  interest 
in  the  project  either  through  bulletin  board  sign-up  procedures  or  by  contacting  the  experimenters 
were  assigned  an  orientation  appointment.  The  orientation  appointment  lasted  two  hours,  was 
conducted  individually,  and  consisted  of  screening  procedures  and  orientation  procedures. 

Upon  arrival  for  their  appointment,  subjects  completed  an  informed  consent  form  that 
described  the  general  nature  of  the  project  and  the  expectations  for  their  participation.  The  consent 
form  also  described  the  nature  of  the  payment  schedule  including  the  rate  of  payment  and  the  bonus 
agreement.  All  subjects  who  requested  orientation  interviews  agreed  to  participate.  Subjects  then 
completed  a  biographical  data  sheet  that  included  information  such  as  name,  local  and  permanent 
addresses,  phone  numbers,  and  appropriate  times  to  contact  them  during  the  day.  Subjects  were 
questioned  regarding  any  gross  hearing  or  vision  problems  and  the  use  of  any  medications, 
especially  central  nervous  system  stimulant  or  depressant  medications.  A  brief  visual  acuity 
examination  was  conducted  using  a  standard  Snellen  eye  chart.  This  was  to  confirm  that  all 
subjects  had  normal  or  corrected  vision  of  approximately  20/20,  and  certainly  no  worse  than 
20/30.  No  subjects  were  eliminated  at  either  testing  site  during  the  screening  process.  This  level 
of  success  in  recruiting  qualified  subjects  may  have  been  due  in  part  to  the  fact  that  the 
announcements  used  in  recruiting  subjects  listed  many  of  the  required  characteristics,  such  as 
native  English  speaking  males  with  hearing  and  vision  in  normal  ranges.  Thus,  subjects  who 
sought  orientation  appointments  were  already  self-seiected  based  on  these  announced  restrictions. 
Subjects  were  then  given  the  opportunity  to  ask  any  remaining  questions  they  had  about  the 
project,  the  payment  procedures,  or  their  commitment  to  participate.  They  were  then  scheduled  for 
training  and  testing  sessions.  At  this  point,  most  subjects  agreed  to  answer  some  ancillary 
questionnaires  that  required  about  23  minutes  to  complete. 

Following  the  screening  procedures,  subjects  were  provided  an  individual  orientation  to  the 
tasks  being  evaluated  in  the  project.  Because  so  many  performance  tasks  were  involved  in  the 
project,  it  was  believed  that  prior  task  description  and  instruction,  and  a  brief  exposure  to  each  task 
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type  would  aid  subjects  in  mastering  and  performing  the  task  batteries.  As  a  result,  the  first 
training  day  required  much  less  task  description  and  instruction,  and  more  time  was  devoted  to 
actual  practice  on  the  tasks. 

Subjects  were  seated  in  front  of  a  microcomputer,  were  given  either  an  oral  or  written 
description  of  each  task,  were  given  an  opportunity  to  ask  questions  and  clarify  their  understanding 
of  the  task,  and  were  then  given  an  opportunity  to  perform  a  brief  orientation  trial  of  the  task 
usually  lasting  no  more  than  approximately  one  minute.  As  noted  previously,  the  STRES  and  CTS 
batteries  share  similar  versions  of  numerous  tasks.  Thus,  subjects  were  not  instructed  on  all  tasks, 
but  were  informed  that  variations  of  certain  tasks  would  be  encountered  during  actual  training  and 
testing  sessions.  The  subjects  were  provided  orientation  trials  on  the  Grammatical  Reasoning, 
Mathematical  Processing,  Memory  Search,  Spatial  Processing,  Unstable  Tracking,  and  Reaction 
Time  tasks  from  the  STRES  battery,  as  well  as  the  Manikin  task  from  the  WRAIR  PAB.  These 
constituted  the  most  complex  tasks  or  those  tasks  with  the  most  complex  instructional  sets.  This 
orientation  procedure  lasted  approximately  one  hour. 

4.6.2  Features  of  the  Testing  Protocol 

As  noted  in  the  Project  Design  section  above,  this  project  required  a  complex  design  to 
accomplish  its  many  goals.  Constructing  the  design  included  concern  for  such  issues  as  achieving 
adequate  training  to  obtain  asymptotic  performance  on  the  various  tasks,  collecting  baseline  data  at 
an  optimal  period  in  the  project,  addressing  issues  such  as  task  sequence  and  battery  sequence 
effects,  scheduling  retesting  periods  to  obtain  meaningful  reliability  data,  and  providing  adequate 
testing  time  for  the  exploration  of  basic  research  questions.  Reference  to  Figure  1  will  reveal  a 
number  of  unique  design  features  constructed  to  meet  these  competing  research  needs. 

Training,  baseline,  and  additional  data  collection  at  the  University  of  Oklahoma  was  conducted 
during  the  five  weeks  noted  in  Figure  1.  An  additional  week  prior  to  these  sessions  was  needed 
for  subject  recruitment  and  orientation  appointments  as  noted  in  Section  4.6.1  above.  Thus,  the 
project  protocol  required  a  complete  data  collection  cycle  of  about  six  weeks.  The  laboratory 
configuration  at  the  University  of  Oklahoma  accommodated  eight  subjects  during  each  two-hour 
training/test  session.  The  four  sessions  scheduled  each  day  allowed  for  the  testing  of  32  subjects 
per  data  collection  cycle.  Two  complete  data  collection  cycles  were  needed  to  acquire  data  from  the 
desired  number  of  subjects,  i.e.,  a  total  of  64  subjects. 

Armstrong  Laboratory  subjects  required  a  shorter,  two  and  one-half  week  data  collection  cycle 
because  they  were  not  involved  in  the  reliability,  deadline,  and  extended  trial  data  collection. 
Because  Armstrong  Laboratory  subjects  were  run  individually,  a  maximum  of  four  subjects  in 
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two-hour  sessions  could  be  scheduled  for  any  data  collection  cycle.  Four  cycles  were  needed  to 
collect  the  desired  number  of  Armstrong  Laboratory  subjects,  i.e.,  a  total  of  16  subjects. 

Data  from  the  Armstrong  Laboratory  subjects  during  the  training  and  baseline  sessions,  and 
the  corresponding  data  from  the  University  of  Oklahoma  subjects,  provided  the  database  for 
exploring  training  requirements  on  the  task  batteries,  establishing  the  normative  data  for  the  various 
tasks,  assessing  fairly  immediate  test-retest  reliability  levels,  and  comparing  the  effects  of  group 
versus  individual  subject  task  administration.  That  the  data  were  collected  in  different  laboratories, 
with  different  laboratory  personnel,  and  with  different  computer  equipment  (although  the 
equipment  models  were  the  same),  provided  the  opportunity  to  assess  the  robustness  of  the 
software,  hardware  and  testing  procedures  to  such  subtle  testing  influences. 

The  additional  data  collected  at  the  University  of  Oklahoma  were  designed  to  address  other 
research  questions.  Repeated  baseline  testing  sessions  in  Weeks  3  and  5  (labeled  "R"  on  Figure 
1),  provided  additional  data  for  reliability  analyses  over  longer  time  periods.  Having  more  than 
one  retest  session  during  these  weeks  provided  the  opportunity  for  subjects  to  overcome  any 
declines  in  performance  efficiency  that  might  have  occurred  due  to  the  passage  of  time.  By  re¬ 
establishing  baseline  performance  in  this  cost-efficient  manner,  the  subjects  were  then  fully  capable 
of  providing  additional  data.  This  situation  was  capitalized  upon  by  following  the  repeat  baseline 
data  collection  sessions  with  testing  sessions  aimed  at  answering  questions  of  basic  theoretical 
significance  (i.e.,  questions  addressing  deadline  and  extended  trial  effects). 

4.6.3  Training  and  Testing  Procedures 

Following  subject  recruitment,  screening,  and  task  orientation,  the  subjects  completed  one 
week  (i.e.,  five  consecutive  days)  of  training  sessions.  The  training  sessions  each  lasted  two 
hours.  During  the  first  session,  each  task  was  introduced  by  the  experimenter  and  described. 
(Specific  battery  and  task  sequences  are  described  in  the  next  Section  4.6.4.).  Because  the 
subjects  had  prior  orientation  trials  on  most  of  the  tasks,  a  simple  description  was  all  that  was 
needed  for  the  subjects  to  begin  performing  the  task.  Because  some  tasks  were  difficult  or  had 
complex  instructional  sets  (e.g.,  Grammatical  Reasoning,  STRES  Reaction  Time  task),  the 
experimenter  had  to  take  additional  care  in  presenting  the  task  and  close  scrutiny  was  given  each 
subject's  performance  to  ensure  correct  understanding.  Those  tasks  that  were  not  included  in  the 
prior  orientation  had  to  be  carefully  presented  for  the  first  time,  again  with  close  scrutiny  of  the 
subject's  performance.  Subsequent  training  sessions  required  little  additional  instruction,  although 
the  experimenters  were  always  scrutinizing  the  subject's  performance  to  ensure  understanding  and 
compliance  with  task  requirements. 


35 


Following  this  first  week  of  training  sessions,  the  subjects  returned  after  the  weekend  for 
baseline  sessions  on  the  first  two  days  of  the  second  week.  During  these  baseline  sessions,  the 
task  testing  sequences  for  each  subject  remained  the  same  as  those  performed  in  the  training 
sessions. 

In  the  third  week  of  each  data  collection  cycle,  half  of  the  University  of  Oklahoma  subjects 
(N«16)  returned  for  two  days  of  retest  reliability  testing.  These  data  were  collected  five  days 
following  the  last  test  session  and  are  referred  to  as  "one -week"  retest  Interval  data 
throughout  this  report  The  task  testing  sequence  for  each  subject  was  the  same  as  that  assigned 
during  training  and  baseline  sessions.  The  remaining  three  days  of  the  third  week  were  used  for 
investigations  of  response  deadlines  (see  Section  4.6.5)  or  extended  trial  lengths  (see  Section 
4.6.6).  On  these  days,  subjects  received  trials  with  deadlines  of  varying  lengths  or  with  varying 
trial  lengths  imposed.  In  the  case  of  varying  trial  lengths,  fewer  actual  trials  were  possible. 

No  testing  was  conducted  during  the  fourth  week  of  either  data  collection  cycle.  During  the 
fifth  week  of  each  data  collection  cycle,  the  remaining  half  of  the  subjects  in  that  cycle  (N-16) 
returned  for  retest  reliability  testing  in  the  same  manner  as  described  previously  for  the  one- week 
retest  sessions.  While  these  data  were  collected  19  days  after  the  last  previous  test  session,  they 
are  referred  to  as  the  "three- week"  retest  interval  data  throughout  this  report.  As  before,  the 
three  remaining  days  of  the  fifth  week  were  used  for  additional  investigations  of  deadline  and 
extended  trial  effects.  Deadline  testing  was  conducted  with  the  one-week  retest  subjects  during  the 
first  cycle  and  with  the  three-week  retest  subjects  during  the  second  cycle.  Extended  trial  testing 
was  performed  vice-versa. 

At  the  conclusion  of  the  two  data  collection  cycles,  64  subjects  from  the  University  of 
Oklahoma  and  16  Armstrong  Laboratory  subjects  had  been  trained  and  tested  on  baseline 
conditions.  Thirty-three  subjects  from  the  University  of  Oklahoma  had  been  tested  again  at  the 
one-week  retest  baseline  period  along  with  additional  deadline  and  extended  trial  testing,  and  thirty- 
one  University  of  Oklahoma  subjects  had  been  tested  again  at  the  three-week  retest  baseline  period 
along  with  additional  deadline  and  extended  trial  testing. 

4.6.4  Battery  and  Task  Sequences 

Task  sequence  and  battery  sequence  effects  were  assessed  throughout  the  study.  Four 
subgroups  of  approximately  twenty  subjects  each  were  tested  using  different  task  sequences.  This 
includes  approximately  sixteen  subjects  from  the  University  of  Oklahoma  and  four  Armstrong 
Laboratory  subjects. 
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During  each  of  the  training,  baseline,  and  repeat  baseline  sessions,  half  of  the  subjects 
performed  two  trials  of  each  STRES  battery  configuration  during  the  first  hour  while  the  other  half 
of  the  subjects  performed  one  session  of  the  CTS  and  WRAIR  PAB  tasks.  The  subjects  switched 
workstations  during  the  second  hour  so  that  each  subject  could  complete  all  tasks.  Within  each 
hour,  those  subjects  performing  the  CTS  and  WRAIR  PAB  configurations  switched  workstations 
half  way  through.  Major  orderings  of  the  batteries  (and  specific  task  orderings  within  the  batteries) 
were  presented  in  counterbalanced  fashion.  Table  6  presents  the  counterbalanced  sequences  of  the 
batteries.  Subjects  were  randomly  assigned  to  one  of  the  four  task  battery  sequences.  Subjects 
were  trained  and  tested  according  to  their  assigned  battery  sequence  throughout  the  project.  That 
is,  a  subject  did  not  change  sequences  at  different  stages  in  the  project 

To  construct  the  battery  sequences  described  in  Table  6,  the  entire  complement  of  tasks  used 
in  the  project  was  divided  into  two  major  orderings.  One  of  the  major  orderings  began  with  the 
administration  of  the  WRAIR  Sleep/Mood  scales  followed  by  a  trial  of  most  of  the  STRES  battery 
tasks  performed  in  one  of  four  fixed  orderings  (described  below).  After  a  short  break,  the  subject 
performed  the  second  trial  on  the  STRES  tasks  in  the  same  task  order  as  before,  and  finished  this 
major  ordering  by  completing  the  WRAIR  Sleep/Mood  scales  a  second  time.  This  major  ordering 
took  subjects  approximately  one  hour  to  complete. 

The  other  major  task  ordering  consisted  of  the  counterbalanced  presentation  of  the  CTS  tasks 
(one  trial  each  in  one  of  four  task  orders  described  below)  and  the  WRAIR  PAB  tasks  along  with 
the  STRES  Reaction  Time  task.  The  WRAIR  Sleep/Mood  scales  were  not  administered  during  this 
major  ordering.  Tliis  major  ordering  required  approximately  one  hour  to  complete. 

These  major  o-derings  of  the  batteries  were  counterbalanced  to  produce  the  sequences 
presented  in  Table  6.  For  example,  the  first  column  in  Table  6  presents  one  of  the  sequences 
consisting  of  the  STRES  major  ordering  followed  by  the  CTS/WRAIR  PAB  major  ordering.  The 
exact  number  of  subjects  performing  each  sequence  is  noted  under  the  column  heading. 

Embedded  within  the  counterbalanced  battery  sequences  were  four  sets  of  pseudo-random 
task  orderings.  Table  7  presents  these  various  task  orderings.  It  was  not  feasible  to  present  all 
possible  task  orderings.  Therefore,  four  task  orderings  were  selected  on  a  rational  basis.  The  first 
was  the  ordering  specified  by  the  AGARD  STRES  manual  (AGARD,  1989).  The  second  ordeiing 
was  the  ordering  used  in  the  development  of  the  CTS  normative  database  (Schlegel  and  Gilliland, 
1990).  The  remaining  two  task  orderings  were  constructed  through  random  selection  with  the  only 
restriction  being  that  they  net  resemble  the  other  existing  task  orderings.  For  a  particular  subject, 
the  same  task  order  was  used  for  both  the  CTS  and  STRES  tasks.  To  achieve  a  combined  balance 
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across  the  sixteen  combinations  of  battery  sequence  and  task  order,  the  same  number  of  subjects 
(four  University  of  Oklahoma  subjects  and  one  Armstrong  Laboratory  subject)  was  assigned  to 
each  combination. 


Table  7.  Task  Orderings. 
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4.6.5  Imposed  Deadline  Procedures 

Thirty-three  subjects  participated  in  three  days  of  testing  under  response  deadline  conditions. 
The  purpose  of  the  deadline  testing  was  to  determine  if  the  added  time  pressure  would  be  beneficial 
or  detrimental  to  subject  performance.  Possible  effects  may  include  a  reduced  mean  RT  for  correct 
responses,  a  smaller  standard  deviation  in  RT,  and  an  increase  in  the  number  of  missed  responses 
(with  a  corresponding  decrease  in  percentage  correct).  Of  additional  interest  was  identification  of 
changes  in  the  shape  of  the  RT  probability  distribution  and  investigation  of  the  ability  to  generate 
the  unconstrained  distribution  from  knowledge  of  the  deadline  distribution.  Only  the  STRES 
versions  of  Grammatical  Reasoning,  Mathematical  Processing,  Memory  Search  (2-  and  4- 
character)  and  Spatial  Processing  were  used.  The  STRES  COMBO  task  was  also  performed  in 
place  of  the  Unstable  Tracking  task  in  the  subject's  normally  assigned  task  sequence.  Based  on  a 
summary  of  the  data  from  the  second  baseline  session  for  the  first  cycle  of  the  University  of 
Oklahoma  subjects  and  based  on  previous  CTS  standardization  data,  deadlines  were  established  at 
approximately  the  mean  RT  plus  one  standard  deviation  (short  deadline)  and  the  mean  RT  plus  two 
standard  deviations  (moderate  deadline).  Table  8  presents  the  specific  deadlines  for  each  task. 


Table  8.  Response  Deadlines  (msec). 
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The  subjects  were  assigned  to  one  of  two  groups  (16  subjects  in  the  short  deadline  gro\p,  17 
subjects  in  the  moderate  deadline  group)  with  an  attempt  made  to  achieve  an  overall  performance 
balance  between  groups  across  all  tasks.  An  outline  of  the  three  days  of  testing  is  provided  in 
Table  9.  On  the  first  day  (Wednesday),  each  subject  performed  the  STRES  battery  four  times, 
with  the  tasks  arranged  in  the  same  sequence  used  by  that  subject  for  training  and  baseline.  In  the 

first  session,  response  deadlines  were  used  according  to  the  group  to  which  the  subject  was 

* 

assigned  (short  or  moderate).  The  deadline  was  indicated  to  the  subject  by  the  disappearance  of  the 
stimulus  from  the  display.  Subjects  were  encouraged  to  respond  before  the  stimulus  disappeared. 
Responses  made  after  the  stimulus  was  removed  were  not  recorded.  In  the  second  session, 
subjects  were  given  the  same  instruction  set  but  the  stimuli  were  not  actually  removed  from  the 
display  (pseudo-deadline).  This  was  done  in  order  to  record  all  responses  with  the  subject 
believing  he  was  working  under  deadline  conditions.  The  task  software  did  not  allow  removal  of 
the  stimulus  at  the  deadline  while  continuing  to  record  responses  after  the  deadline.  The  last  two 
sessions  of  the  first  day  served  as  training  sessions  under  the  actual  deadlines  assigned  to  that 
subject  (short  or  moderate). 


Table  9.  Schedule  for  Deadline  Testing. 
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On  the  second  day  (Thursday),  four  sessions  were  conducted.  Following  a  training  session 
under  the  subject's  assigned  deadline  condition,  each  subject  performed  one  session  under  no 
deadlines,  short  deadlines,  and  moderate  deadlines.  The  order  of  deadline  presentation  was 
counterbalanced  across  subjects. 

On  the  final  day  (Friday),  three  sessions  were  conducted,  one  each  under  no  deadline,  pseudo 
short  deadlines,  and  pseudo  moderate  deadlines.  Subject  instructions  provided  the  information  as 
to  the  condition  under  which  the  subject  should  perform  the  session.  The  various  subject 
instructions  for  all  three  days  and  for  the  different  conditions  are  provided  in  Appendix  A. 
Following  the  sessions,  subjects  were  debriefed  to  get  their  impressions  of  the  entire  study. 

4.6.6  Extended  Trial  Procedures 

The  extended  trial  portion  of  this  research  project  addressed  a  circumscribed  task  battery 
administration  issue  with  extensive  implications  for  basic  research.  At  an  applied  level,  the 
extended  trial  study  examined  whether  the  data  collected  in  the  standard  three-minute  trial 
corresponded  to  data  collected  in  longer  trial  periods.  This  is  an  important  question  if  users  of  task 
batteries  make  generalizations  to  "real  world"  work  performance  from  baseline  data  on  three- 
minute  tasks.  These  generalizations  may  be  markedly  in  error.  Such  probable  discrepancies  may 
be  due  to  the  differences  found  between  data  collected  during  a  short  three-minute  period,  when  a 
subject  can  recruit  all  available  performance  resources,  as  opposed  to  longer  periods  where 
conservation  and  regulation  of  resources  is  required.  Thus,  at  a  simplistic  level,  longer  trial 
lengths  should  lead  to  poorer  overall  performance  due  to  greater  demands  on  resources. 

At  another  level,  this  question  strikes  at  the  core  of  research  in  such  areas  as  stress  and 
adaptation.  Theories  of  stress  and  adaptation  may  provide  important  cues  in  understanding  the 
differences  found  between  data  collected  at  varying  trial  lengths.  Perhaps  even  more  important,  by 
utilizing  advances  in  task  batteries  and  workload  technology,  such  extended  trial  studies  may  play 
an  important  role  in  testing  stress  theories  and  developing  new  theories  of  adaptive  responding. 
The  present  project  provided  the  opportunity  to  investigate  this  domain  at  a  rudimentary  level.  By 
collecting  successive  trials  of  data  on  the  same  task  over  varying  periods  of  time,  simple  yet 
informative  questions  could  be  answered  about  the  ability  of  subjects  to  respond  consistently  over 
time. 

Only  a  limited  opportunity  for  collecting  such  data  existed  within  the  testing  time  limits  of  this 
project.  Thus,  data  could  not  be  collected  on  all  tasks.  Preference  was  given  to  the  STRES  battery 
tasks,  specifically  those  that  could  also  be  generalized  to  CTS  data  (i.e.,  Unstable  Tracking, 
Memory  Search,  Grammatical  Reasoning,  Mathematical  Processing,  and  Spatial  Processing).  The 
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highest  priority  was  given  to  the  Unstable  Tracking  task  due  to  the  continuous  nature  of  this  task. 
The  other  tasks  present  the  subject  with  discrete  problems  that  are  separated  by  short  periods  of 
time  that  could  be  used  as  intermittent  rest  periods.  The  Unstable  Tracking  task  is  continuous,  thus 
providing  a  constant  demand  on  resources. 

Unfortunately,  the  existing  task  software  would  not  permit  extending  the  discrete  trial  length 
beyond  the  standard  three-minute  period.  To  produce  extended  trial  lengths,  multiple  three-minute 
trials  were  administered  in  rapid  succession  minimizing  any  rest  period  between  trials.  The 
extended  trial  lengths  selected  included  multiple  three-minute  epochs  totaling  6-minute,  12-minute, 
and  24-minute  trials.  Figure  3  presents  these  extended  trials.  It  can  be  seen  from  Figure  3  that 
various  comparisons  of  the  three-minute  trial  epochs  provide  a  rudimentary  means  of  answering 
numerous  questions.  For  example,  the  combinations  labeled  "C"  represent  the  comparisons  of  the 
Baseline  three-minute  trial  with  the  average  of  the  three-minute  epochs  for  each  of  the  longer  trial 
lengths.  At  a  fundamental  level  these  comparisons  can  answer  the  question  of  whether  overall 
performance  during  these  trial  lengths  is  equivalent.  Comparisons  labeled  "A"  provide  an 
assessment  of  whether  the  first  three  minutes  of  performance  during  the  varying  trial  lengths 
produce  similar  levels  of  performance.  If  subjects  regulate  their  performance  resources  to  match 
the  time  demands,  these  first  three-minute  epochs  may  not  yield  comparable  performance  levels. 
Comparisons  labeled  "B"  simply  provide  an  assessment  of  any  differences  between  the  first  epoch 
and  every  other  epoch  in  a  24-minute  trial.  Such  a  comparison  yields  information  regarding  the 
stability  of  performance  across  the  total  trial  length.  This  type  of  comparison  can  be  performed  for 
the  other  multi  -epoch  trial  lengths.  Similar  comparisons  can  be  made  with  the  three-minute 
baseline  trial. 

During  the  weeks  of  retesting  (weeks  three  and  five),  the  first  two  days  of  each  week  were 
dedicated  to  retest  data  collection.  The  remaining  days  were  dedicated  to  deadline  or  extended  trial 
data  collection.  Due  to  the  limited  amount  of  time  for  such  data  collection  and  the  constraints  due 
to  each  two-hour  test  session,  the  presentation  of  all  counterbalanced  combinations  of  extended 
trial  length  was  not  possible.  As  a  result,  the  24-minute  trial  length  condition  was  given  priority 
because  it  was  the  condition  that  placed  the  greatest  degree  of  demand  on  the  subjects  and  provided 
the  longest  amount  of  time  on  task  to  observe  any  differences  that  might  occur. 

Table  10  presents  one  of  the  four  testing  protocols  for  the  extended  trial  length  sessions.  Not 
unlike  the  technique  used  for  presenting  the  battery  sequences,  major  orderings  of  trial  lengths 
were  constructed.  The  orderings  each  lasted  approximately  one  hour  so  that  two  of  them  could  be 
accommodated  within  the  overall  two-hour  test  session  time  constraint.  One  group  of  orderings 
included  two  24-minute  trials  (never  the  same  task).  Because  these  24-minute  trials  were  expected 
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Figure  3.  Extended  Trial  Analyses. 


Table  10.  Schedule  for  Extended  Trial  Study  -  Group  A2. 


to  be  rather  fatiguing,  no  more  than  two  such  trials  were  administered  on  any  test  day.  This  meant 
that  the  second  group  of  major  orderings  included  combinations  of  6-minute  and  12-minute  trials. 
Unfortunately,  due  to  technical  constraints,  it  was  never  possible  to  administer  the  6-minute  trials 
first  within  this  group  of  orderings,  and  thus  within  any  test  session. 

On  any  specific  test  day,  subjects  received  in  counterbalanced  order  a  combination  of  the  two 
major  groupings.  Following  the  example  in  Tabic  10,  during  the  first  hour  of  Wednesday's  test 
session,  subjects  were  administered  a  24-minute  trial  of  the  Unstable  Tracking  task  followed  by  a 
24-minute  trial  of  the  Sternberg  task.  During  the  second  hour,  the  subjects  received,  in  order,  a 
12-minute  trial  of  Spatial  Processing,  6  minutes  of  the  COMBO  task,  6  minutes  of  Mathematical 
Processing,  6  minutes  of  the  COMBO  task,  12  minutes  of  Mathematical  Processing  and  a  standard 
3-minute  trial  of  the  COMBO  task.  As  shown  in  the  table,  four  sets  of  Sleep/Mood  data  were 
obtained  during  the  two-hour  period.  Similar  schedules  were  used  for  Thursday  and  Friday. 
Subjects  were  assigned  to  one  of  four  extended  trial  schedules  to  achieve  a  balance  among  the 
groups  based  on  overall  baseline  performance.  All  four  schedules  are  provided  in  Appendix  B. 

The  specialized  testing  periods  dedicated  to  deadline  and  extended  trial  length  also  provided 
one  of  the  few  opportunities  to  collect  data  on  other  tasks  of  interest.  For  that  reason,  subjects 
were  also  administered  the  STRES  COMBO  task  (a  dual-task  combination  of  Memory  Search  with 
four  characters  and  Unstable  Tracking).  This  task  was  simply  included  in  the  task  administration 
protocol  on  these  test  days.  Although  performing  these  two  tasks  in  combination  was  new  to  the 
subjects,  the  tasks  themselves  were  well-practiced  at  this  point.  Thus,  the  introduction  of  the 
COMBO  task  at  this  point  in  the  study  was  not  viewed  as  disruptive  in  any  sense. 

4.6.7  Debriefing  Procedures 

Following  the  completion  of  training  and  test  sessions,  subjects  participated  in  a  debriefing 
session.  These  debriefing  sessions  lasted  approximately  30  minutes  and  consisted  of  an  oral 
interview  with  the  experimenter(s).  Table  11  presents  the  general  contents  of  the  debriefing 
interview.  During  these  interviews,  the  experimenters  attempted  to  determine  the  subjects'  general 
response  to  the  study  and  specific  information  about  the  tasks  and  batteries.  Experimenters 
questioned  the  subjects  regarding  the  nature  of  specific  tasks,  as  well  as  comparisons  between 
tasks.  Subjects  were  also  asked  to  provide  information  on  strategies  used  to  perform  the  tasks  and 
points  at  which  they  may  have  changed  those  strategies.  Hardware  and  software  were  evaluated, 
as  were  lab  accommodations,  initial  instructions  and  staff  support.  Finally,  those  subjects  that 
participated  in  additional  trials  investigating  deadline  and  extended  trial  effects  were  asked  to 
comment  on  that  experience. 
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Table  11.  Subject  Debriefing  Topics. 


Overall  Impressions 

Most  Preferred/Least  Preferred  Tasks 

Strategies  to  Improve  Performance  ("tricks") 

Grammatical  Reasoning  Symbols  (40%  of  subjects) 
Mathematical  Processing  Digits 
Spatial  Processing  Bar  Heights 
Stimulus  Sequence  Memorization  (SIRES  Spatial) 

Comparison  of  CTS  vs.  STRES  Hardware  and  Software 

Comments  on  the  Sleep  and  Mood  Scales 

Departure  from  Standard  Instructions  (e.g.,  finger  placement) 

Opinions  on  Response  Deadlines  and  Extended  Trials 

Adequacy  of  Orientation  Session 

Treatment  by  Experimental  Staff 
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5.0  PROJECT  RESULTS 


This  chapter  of  the  report  presents  the  data  and  analyses  from  this  project  beginning  with  a 
description  of  the  magnitude  of  the  data  collection  effort  and  some  characteristics  of  the  database. 
A  brief  discussion  of  the  results  is  included  within  each  section.  The  first  major  section  of  this 
chapter  presents  baseline  and  training  data  for  each  of  the  task  batteries.  The  presentation  format  is 
of  a  summary  nature  that  should  be  particularly  useful  to  those  researchers  interested  in  normative 
response  patterns  for  each  task.  In  the  body  of  the  report,  emphasis  is  placed  on  measures 
typically  of  interest  to  researchers  (i.e.,  mean  and  standard  deviation).  Also  included  for  the 
convenience  of  the  reader  are  tables  with  selected  percentile  groupings  which  allow  classification  of 
subjects  into  performance  categories.  Graphs  for  each  major  task  measure  and  other  detailed 
information  are  included  in  numerous  appendices. 

Following  the  first  section  on  baseline  and  training  data  are  sections  that  present  the  results  of 
other  analyses  including  task  measure  reliability,  comparisons  across  task  batteries,  group  versus 
individual  testing  procedures,  task  order  and  battery  sequence  effects,  effects  of  deadline 
conditions,  effects  of  extended  trial  length,  and  the  usefulness  of  the  psychometric  state  measures. 

It  is  important  to  note  here  that  the  analysis  of  the  individual  versus  group  training  effect  (i.e., 
the  comparison  of  University  of  Oklahoma  and  Armstrong  Laboratory  subjects)  yielded  no 
significant  differences  of  any  importance.  This  analysis  will  be  discussed  in  more  detail  in  Section 
5.4.  As  a  result  of  this  finding,  the  data  from  subjects  at  the  University  of  Oklahoma  and  the 
Armstrong  Laboratory  were  combined.  The  data  representing  training  and  baseline  performance 
reflect  the  total  sample  of  79  subjects  with  one  major  exception,  the  CTS  Unstable  Tracking  task. 
The  major  summaries  and  analyses  for  this  task  include  only  the  CTS  Unstable  Tracking  data  from 
the  University  of  Oklahoma  (see  Section  5.1.3). 

5.1  General  Normative  Database 

This  project  involved  the  collection  of  a  massive  data  base.  Only  a  portion  of  those  data  is 
summarized  within  this  report.  Table  4  presented  a  list  of  the  primary  performance  measures 
collected  and  analyzed  and  Table  12  presents  a  summary  of  the  data  collection  effort.  Over  20,000 
data  observations  (subjects  x  trials  x  tasks),  each  containing  numerous  dependent  measures,  were 
collected  and  analyzed.  More  than  140  dependent  measures  were  obtained  on  multiple  days,  and 
over  50  of  these  were  included  in  some  phase  of  the  analysis.  It  is  noteworthy  that  of  the  20,000 
plus  observations,  only  seven  were  lost  due  to  equipment  or  procedural  errors.  An  additional  100 
of  the  20,000  represented  outliers  that  were  removed  prior  to  the  summaries  and  analyses.  The 
majority  of  the  deleted  observations  were  due  to  identifiable  subject  errors. 
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Table  12.  Summary  of  Data  Collection  Effort. 


More  than  140  Dependent  Measures 

More  than  50  Dependent  Measures  Analyzed 

Five  Training  Days/Two  Baseline  Days/Two  Retest  Days  (all  subjects) 

STRES  -  two  sessions  per  day 

CTS,  RCT,  WRAIR  PAB  -  one  session  per  day 

Three  Days  of  Response  Deadline  Testing  (33  subjects) 

STRES  -  four  sessions  per  day  (including  CBO  Dual-Task) 

Three  Days  of  Extended  Trial  Testing  (31  subjects) 

STRES  -  6, 12,  and  24  minutes  of  GRM,  MTH,  STN4,  SPA,  CBO 

More  than  20,000  observations 

Seven  lost  observations  due  to  hardware/procedural  errors 

One  Oklahoma  and  one  Armstrong  Lab  subject  removed  due  to  poor  motivation 

Approximately  100  (<  0.5%)  outlier  observations  removed  due  to  subject  errors 


This  project  required  subjects  to  return  to  the  respective  laboratories  each  day  for  two  hours, 
on  multiple  days,  across  multiple  weeks,  to  perform  the  same  sequence  of  fairly  repetitive 
performance  tasks.  In  general,  the  subjects  participating  in  this  project  understood  the  commitment 
in  time  and  effort  that  they  were  making  and  understood  the  value  of  their  contribution  to  this 
research.  This  was,  in  most  cases,  sufficient  to  maintain  adequate  motivation  in  the  participants. 
However,  the  data  from  two  subjects  (one  from  each  testing  site)  had  to  be  eliminated  due  to  poor 
motivation.  These  subjects  exhibited  chronic  tardiness,  missed  sessions,  and  provided  highly 
variable  data.  While  it  was  clear  that  on  any  trial  they  were  capable  of  providing  data  within  the 
typical  range  of  the  other  participants,  these  two  subjects  were  repeatedly  uncooperative  and  often 
provided  data  that  could  easily  be  characterized  as  "outlier"  in  nature.  For  these  reasons,  the  data 
from  these  two  subjects  were  eliminated  from  the  analysis. 
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To  facilitate  the  screening  of  outliers,  the  SAS  Univariate  Procedure  was  used  separately  on 
the  University  of  Oklahoma  and  the  Armstrong  Laboratory  data  sets.  In  addition  to  providing  the 
mean,  standard  deviation,  upper  quartile  and  lower  quartile  values,  and  a  stem-and-leaf  or 
histogram  plot,  the  procedure  presents  a  box -and- whisker  plot  and  identifies  the  five  lowest  and 
the  five  highest  data  values.  Through  the  box-and-whisker  plot,  SAS  distinguishes  between 
extreme  data  values  that  are  1.5  to  3  interquartile  ranges  away  from  the  nearest  quartile  point  vs. 
those  that  are  more  than  3  interquartile  ranges  away. 

For  each  task  in  each  battery,  a  listing  was  made  of  those  subjects  for  whom  potential  outlier 
trials  existed.  A  potential  outlier  trial  was  one  which  was  placed  in  either  extreme  category  on  the 
box-and-whisker  plot  for  response  time  (usually  a  long  mean  RT)  or  percentage  correct  (usually  a 
low  PC).  A  separate  identification  was  made  of  those  trials  which  differed  from  the  mean  by  more 
than  4  standard  deviations. 

Each  of  these  trials  was  closely  examined  to  determine  if  the  performance  was  consistent  with 
that  subject's  typical  performance.  In  a  few  cases,  a  particular  subject  was  identified  as  a  poor 
performer,  either  in  general  (as  with  the  two  eliminated  subjects  mentioned  earlier)  or  for  a  specific 
task.  In  all  cases,  reference  was  made  to  the  daily  subject  log  in  order  to  confirm  the  nature  or 
possible  cause  of  the  outlier  data.  Although  in  most  cases  the  procedure  identified  outliers  on  the 
poor  performance  side,  it  was  also  helpful  in  identifying  certain  subjects  who  changed  their  task 
performance  strategies  late  in  the  study  and  performed  much  better  than  their  typical  performance. 

The  following  narrative  summarizes  the  magnitude  of  the  outlier  elimination  for  each  task  in 
addition  to  the  two  subjects  completely  eliminated  from  the  database.  The  number  of  eliminated 
trials  is  from  a  total  of  1362  training,  baseline,  and  retest  trials  for  the  STRES  tasks  and  681  trials 
for  each  of  the  CTS  and  WRAIR  PAB  tasks  and  the  STRES  Reaction  Time  forms  (a  total  of 
17,706  trials).  For  the  STRES  battery,  the  summary  is  as  follows:  23  trials  involving  9  subjects 
for  GRM,  7  trials  involving  4  subjects  for  MTH,  4  trials  involving  3  subjects  for  STN2,  5  trials 
involving  3  subjects  for  STN4,  5  trials  involving  3  subjects  for  SPA,  and  9  trials  involving  4 
subjects  for  TRK.  In  addition,  all  Unstable  Tracking  data  for  one  subject  were  eliminated.  Across 
all  forms  for  STRES  Reaction  Time,  a  total  of  16  trials  involving  1 1  subjects  were  removed,  in 
addition  to  all  of  one  session  for  one  subject  and  all  of  the  CODED  form  data  for  one  subject. 
Outlier  removal  for  the  CTS  was  12  trials  involving  7  subjects  for  GR,  3  trials  involving  3  subjects 
for  MP,  2  trials  with  2  subjects  for  MS,  1  trial  for  SP,  and  4  trials  with  4  subjects  for  UT.  As  with 
the  STRES,  all  UT  data  for  one  subject  (the  same  subject)  were  eliminated.  WRAIR  PAB  outlier 
screening  consisted  of  3  trials  for  the  Time  Wall  task  and  4  trials  for  Interval  Production. 
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5.1.1  Normative  Data  from  Baseline  Trials 


Table  13  presents  baseline  data  summary  statistics  for  the  major  dependent  variables 
associated  with  each  task.  These  data  represent  performance  on  the  second  day  of  baseline  data 
collection  (first  trial  of  second  day  for  STRES  battery  tasks).  This  particular  trial  was  selected  as 
being  most  representative  of  what  the  subjects  could  accomplish  in  terms  of  performance.  The  first 
STRES  trial  of  the  second  day  was  selected  in  order  to  minimize  the  influence  of  fatigue  which 
may  have  affected  performance  on  the  second  trial.  Means  and  standard  deviations  for  the  selected 
dependent  measures  associated  with  each  STRES  battery,  CTS,  and  WRAIR  PAB  task  are 
included.  Comparable  data  (mean,  standard  deviation,  median,  lower  quartile,  and  upper  quartile) 
for  each  training,  baseline,  and  retest  trial  are  provided  in  separate  appendices  for  each  battery. 

The  data  presented  in  Table  13  reveal  a  high  degree  of  consistency  between  similar  tasks  and 
even  across  different  tasks  for  some  variables  (e.g.,  response  time  for  the  STRES  Reaction  Time 
task  and  percentage  correct  measures  in  general).  The  apparent  consistency  between  similar  tasks 
across  different  batteries  is  encouraging  and  suggests  that  these  variations  of  the  same  task  may 
relate  well  to  one  another.  The  major  exception  to  this  general  correspondence  lies  in  the  Unstable 
Tracking  data  where  values  for  the  CTS  version  are  more  than  twice  the  values  derived  from  the 
STRES  version.  This  task  presents  a  unique  problem  because  the  nature  of  its  presentation  and  the 
calculation  of  the  performance  measures  are  highly  dependent  on  the  specific  software  algorithm 
used.  Thus,  it  is  difficult  to  infer  comparability  or  the  lack  thereof  across  these  task  versions  based 
on  strict  numerical  data  values.  Data  trends  and  general  response  characteristics  serve  as  more 
important  comparative  indices  along  with  the  usual  measures.  Detailed  comparisons  of  similar 
tasks  across  batteries  are  addressed  in  Section  5.3. 

The  high  degree  of  similarity  across  the  response  time  measures  for  the  STRES  Reaction  Task 
suggests  that  these  various  forms  of  the  task  all  require  similar  levels  of  response  speed.  The 
consistency  with  regard  to  percentage  correct  measures  in  general  suggests  that  error  rates  for  these 
tasks  were  fairly  low  across  all  tasks  in  all  batteries.  This  uniformly  high  level  of  performance 
presents  some  difficulties  with  respect  to  reliability.  These  issues  will  be  explored  further  in  the 
reliability  analysis  Section  5.2. 

5.1.2  STRES  Battery 

Discrete  Response  STRES  Tasks.  Figures  4  and  5  present  the  mean  response  time  for 
correct  responses  (RT)  and  the  percentage  correct  (PC)  data  respectively  for  the  discrete  response 
STRES  battery  tasks.  These  data  represent  group  performance  on  each  training  day  (Training  la  to 
5b)  and  each  baseline  day  (Baseline  la  to  2b).  On  each  test  day,  subjects  completed  two  trials  for 
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each  of  the  STRES  tasks  except  the  Reaction  Time  task.  In  the  figures,  the  first  trial  is  designated 
by  the  suffix  'a',  and  the  second  trial  by  the  suffix  'b'. 

Table  13.  Means  and  Standard  Deviations  for  Baseline  Data  (Baseline  Day  2). 


KSSSM 

Task 

i  Mean 

Std.  Dev. 

1  Mean 

Std.  Dev. 

Response  Time 

Percentage  Correct 

GRM 

4507 

1275 

96% 

7% 

MTH 

1522 

440 

98% 

4% 

IM1  SpH 

476 

68 

98% 

2% 

552 

99 

97% 

3% 

SPA 

947 

231 

95% 

4% 

RCT 

Response  Time 

Percentage  Correct 

STRES 

1  -  BASIC 

562 

93 

98% 

3% 

6  -  BASIC 

588 

124 

97% 

4% 

2  -  CODED 

638 

112 

96% 

4% 

3  -  UNCERT 

678 

158 

98% 

5% 

4 -  DOUBLE 

595 

104 

97% 

4% 

5  -  INVERT 

651 

120 

96% 

4% 

Edge  Violations 

RMS  Error 

TRK 

0.3 

0.8 

5.9 

4.1 

Response 

Time 

Percentage 

Correct 

GR 

4855 

1454 

96% 

5% 

CTS 

MP 

1703 

502 

96% 

5% 

MS4 

615 

118 

98% 

2% 

SP 

836 

220 

93% 

4% 

Edge  Violations 

RMS  Error 

UT 

1.1 

2.6 

11.7 

7.0 

Response  Time 

Percentage  Correct 

MAN 

1179 

466 

97% 

4% 

WRAIR 

Interval  Mean  | 

Interval 

S.D. 

INT 

1024 

129 

72 

33 

TIM 

9670 

868 

411 

256 

STRES  Tasks 
in  Response  Time 


igure  4.  Mean  Response  Time  for  Discrete  Response  STRES  Tasks. 


STRES  Tasks 
Percentage  Correct 
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Figure  5.  Mean  Percentage  Correct  for  Discrete  Response  STRES  Tasks. 


The  figures  suggest  that  stable  performance  was  reached  quite  rapidly  for  the  majority  of 
tasks.  In  fact,  many  tasks  appeared  to  demonstrate  little  improvement  after  the  second  day  of 
training.  The  Grammatical  Reasoning  task  was  a  notable  exception.  This  task  appeared  to  take 
longer  to  learn  in  terms  of  achieving  both  stable  response  time  and  percentage  correct.  Stable 
performance  did  appear  to  be  reached  by  the  fifth  training  day.  The  obvious  explanation  for  this 
difference  is  revealed  in  the  general  separation  between  Grammatical  Reasoning  and  the  other  tasks 
in  terms  of  response  time,  as  well  as  numerous  anecdotal  comments  from  subjects.  That  is,  the 
Grammatical  Reasoning  task  is  generally  viewed  as  the  most  challenging  of  all  the  tasks. 

Reaction  Time  Task.  Figures  6  and  7  present  the  mean  response  time  for  correct 
responses  and  percentage  correct  data  for  the  STRES  Reaction  Time  task.  This  task  is  somewhat 
unique.  The  subject  provides  responses  to  visual  stimuli  in  a  variety  of  forms.  Each  of  these 
forms  of  the  task  is  similar  in  that  responses  are  limited  to  keyboard  button  presses  with  the  index 
and  second  fingers  of  the  right  and  left  hands.  However,  the  instructional  sets  are  different  for 
each  form  (see  AGARD,  1989).  In  the  Basic  form,  the  subject  uses  the  left  hand  if  the  stimulus 
appears  on  the  left  side  of  the  screen  and  the  right  hand  if  the  stimulus  is  on  the  right  side.  Within 
each  hand,  the  leftmost  finger  is  used  if  the  stimulus  is  a  *2'  or  '3'  and  the  rightmost  finger  is  used 
if  the  stimulus  is  a  '4'  or  '5'.  A  Basic  series  is  presented  at  the  beginning  (1  -  BASIC)  and  at  the 
end  (6  -  BASIC)  of  the  sequence.  In  the  second  series  (2  -  CODED),  the  stimulus  quality  is 
degraded  to  one  of  four  levels.  Time  uncertainty  for  the  appearance  of  the  stimulus  is  introduced  in 
the  third  series  (3  -  UNCERT)  by  varying  the  interstimulus  interval  between  2  and  10  seconds. 
Double  responses  are  required  in  the  fourth  series  (4  -  DOUBLE)  such  that  the  subject  must  press  a 
sequence  of  three  keys  with  the  same  hand  beginning  with  the  key  for  the  correct  response.  In  the 
fifth  series  (5  -  INVERT),  the  screen-side  to  hand  response  mapping  is  inverted,  i.e.,  stimuli  on 
the  left  side  of  the  screen  require  responses  with  the  right  hand  and  vice-versa. 

The  general  similarity  in  the  requirements  of  the  various  forms  of  the  task  probably  accounts 
for  the  general  uniformity  in  response  time  across  the  forms  (see  Table  13).  Like  many  of  the 
other  STRES  and  CTS  tasks,  the  percentage  correct  measures  for  the  various  forms  of  the  STRES 
Reaction  Time  task  were  uniformly  high  (see  Table  13).  Again,  fairly  stable  performance  on  both 
of  these  measures  is  seen  beyond  the  second  day  for  most  tasks. 

Although  not  statistically  significant,  it  is  interesting  to  note  that  during  the  last  day  of  training 
and  the  two  days  of  baseline  testing  the  mean  response  times  for  the  various  forms  of  this  task 
retained  a  consistent  ordering.  The  first  Basic  series  (1)  provided  the  fastest  RT  followed  by  the 
final  Basic  series  (6).  Next  in  order  were  the  Double  key  press  (4),  the  Coded  stimulus  (2),  and 
the  Inverted  hands  (5)  forms.  Finally,  the  time  Uncertain  form  (3)  had  the  slowest  RT. 
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Figure  7.  Mean  Percentage  Correct  for  STRES  Reaction  Time  Task. 


Unstable  Tracking  Task.  Figure  8  presents  the  Edge  Violation  and  RMS  Error  measures 
for  die  STRES  battery  Unstable  Tracking  task.  Figure  8  suggests  longer  learning  curves  for  the 
Unstable  Tracking  task,  compared  to  other  tasks.  Stable  performance  in  terms  of  Edge  Violations 
appeared  earlier  (about  the  third  training  day)  compared  with  RMS  Error  which  does  not  appear  to 
reach  stability  until  the  Baseline  Testing  days.  This  result  is  logical  because  the  number  of  control 
losses  at  the  periphery  (Edge  Violations)  decreases  early  (and  probably  proportionally)  as  the 
subject,  with  continued  practice,  progressively  reduces  tracking  variation  about  the  center  point 
(i.e.,  RMS  Error).  It  is  worth  cautioning,  however,  that  the  Unstable  Tracking  task  is  among  a 
small  group  of  tasks  that  probably  require  more  practice  to  attain  stable  performance  than  most 
other  tasks  assessed  in  this  project.  It  is  even  more  important  to  consider  this  issue  when  one 
recalls  that,  in  this  protocol,  Unstable  Tracking  and  Grammatical  Reasoning  were  actually  practiced 
three  times  per  day  (i.e.,  two  STRES  trials  and  one  CTS  trial  per  day). 

Dual-Task  Combination  (COMBO).  Those  subjects  who  participated  in  the  Response 
Deadline  study  performed  a  series  of  eleven  trials  of  the  COMBO  task  interspersed  throughout  the 
deadline  testing.  This  task  replaced  the  Unstable  Tracking  task  in  the  subject's  normal  task 
sequence.  Summary  tables  and  graphs  for  the  two  component  tasks  are  presented  in  Appendix  C. 
A  comparison  with  single-task  baseline  performance  using  Trial  1  from  Baseline  Day  2  indicates 
that  subjects  maintained  the  same  level  of  performance  on  both  tasks  under  dual-task  conditions. 
The  mean  response  time  tor  the  Memory  Search  task  showed  slight  improvement  over  the  course 
of  the  eleven  trials. 

Additional  information  regarding  normative  data  for  the  STRES  battery  tasks  is  provided  in 
Appendix  C,  which  contains  tabled  values  for  training,  baseline,  and  retest  trials  and  individual 
graphs  (mean,  median,  lower  quartile,  upper  quartile)  of  training  and  baseline  data  for  each  task. 

5.1.3  CTS  Battery 

Figures  9  and  10  present  similar  performance  data  for  the  CTS  tasks  that  yield  response  time 
and  percentage  correct  measures,  respectively.  These  data,  which  represent  group  performance  on 
each  training  day  (T1-T5)  and  each  baseline  day  (B 1  B2),  showed  remarkable  correspondence  to 
the  trends  observed  for  similar  STRES  battery  tasks.  In  this  regard,  the  mean  response  time 
measures  stabilized  rapidly  with  the  Grammatical  Reasoning  task  lagging  behind.  Percentage 
correct  measures  were  uniformly  high  across  all  tasks.  This  correspondence  was  expected  because 
these  are  the  tasks  that  are  the  most  similar  across  the  two  batteries.  As  with  the  corresponding 
STRES  tasks,  these  CTS  task  measures  appear  to  reach  reasonable  levels  of  stability  by  the  second 
or  third  day  of  training.  The  only  exception  appears  to  be  the  CTS  Unstable  Tracking  task. 
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Response  Time  for  Discrete  Response  CTS  Tasks. 


Data  for  the  CTS  Unstable  Tracking  task  are  presented  in  Figure  11.  Unlike  the  data  from  the 
STRES  Unstable  Tracking  task,  the  CTS  data  in  Figure  1 1  represent  only  subjects  from  the 
University  of  Oklahoma.  A  major  difference  existed  in  the  data  obtained  at  the  two  testing  sites. 
The  discrepancy  was  traced  to  a  difference  in  the  manufacturer  of  the  potentiometer  in  the  tracking 
controllers  used  at  Armstrong  Laboratory  and  those  used  at  the  University  of  Oklahoma  (even 
though  both  potentiometers  meet  the  specifications).  This  was  a  regrettable,  but  entirely 
unpredictable,  problem  that  resulted  in  the  data  from  the  Armstrong  Laboratory  being  highly 
inconsistent  with  both  the  data  collected  at  the  University  of  Oklahoma,  as  well  as  previous  CTS 
normative  data  collected  at  both  locations.  For  that  reason,  only  the  CTS  Unstable  Tracking  data 
from  the  University  of  Oklahoma  were  included  in  summaries  and  analyses. 

In  terms  of  response  characteristics  and  trends,  the  CTS  Unstable  Tracking  data  were  similar 
to  the  tracking  data  from  the  STRES  battery.  More  training  trials  were  needed  to  reach  stability  in 
tracking  performance  compared  with  other  CTS  tasks.  Edge  violations  in  the  CTS  data  also 
appeared  to  stabilize  before  the  RMS  Error. 

Additional  information  regarding  normative  data  for  the  CTS  tasks  is  provided  in  Appendix  D, 
which  contains  tabled  values  for  training,  baseline,  and  retest  trials  and  individual  graphs  (mean, 
median,  lower  quartile,  upper  quartile)  of  Gaining  and  baseline  data  for  each  task. 

5.1.4  WRAIR  PAB 

The  mean  response  time  and  percentage  correct  measures  for  the  WRAIR  PAB  Manikin  task 
are  presented  in  Figures  12  and  13,  respectively.  As  with  some  of  the  other  more  difficult  tasks 
noted  previously,  this  task  appears  to  require  more  training  to  achieve  stable  RT  performance.  It 
appears  that  additional  increments  in  performance  efficiency  for  percentage  correct  may  not  be 
important  beyond  trial  four  or  five.  However,  this  trend  may  be  open  to  question  for  mean 
response  time.  Subjects  appeared  to  show  slight  improvement  in  mean  response  time  performance 
even  through  the  baseline  testing  sessions. 

Figures  14  and  15  present  the  means  of  the  estimated  time  intervals  and  the  means  of  the 
interval  variability  (standard  deviation),  respectively,  for  trials  on  the  WRAIR  PAB  Time  Wall 
task.  Stable  performance  appeared  to  be  attained  quite  rapidly  for  this  task.  Although  there  was  a 
slight  decrease  in  the  mean  time  interval  during  the  second  through  fifth  training  trials  (compared 
with  the  first  training  trial),  the  mean  intervals  for  baseline  did  not  appear  to  be  different  from  the 
intervals  on  the  first  training  trial.  Variability  on  this  task  showed  improvement  between  the  first 
and  second  trial,  but  was  stable  throughout  the  remainder  of  the  training  and  baseline  trials. 
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CTS  Unstable  Tracking 


IS  Error  and  Edge  Violations  for  CTS  Unstable  Tracking  Task. 


WRAIR  Manikin 
Mean  Response  Time 
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Figure  1 2.  Mean  Response  Time  for  WRAIR  PAB  Manikin  Task. 


WRAIR  Manikin 
Percent  Correct 


Figure  13.  Mean  Percentage  Correct  for  WRAIR  PAB  Manfldn  Task. 


WRAIR  Time  Wall 


Interval  Length  for  WRAIR  PAB  Time  Wail  Task. 


WRAIR  Time  Wall 
Standard  Deviation 


66 


15.  Mean  Interval  Variability  for  WRAIR  PAB  Time  Wall  Task. 


The  means  of  average  interval  length  and  interval  variability  (standard  deviation)  for  the 
WRA1R  PAB  Interval  Production  task  are  presented  in  Figures  16  and  17.  Like  the  Time  Wall 
task,  stable  performance  was  reached  rapidly  on  the  Interval  Production  task.  By  possibly  the 
third,  and  certainly  the  fourth  trial,  the  mean  interval  measure  appeared  to  reach  asymptote. 
Variability  appeared  to  stabilize  within  two  trials. 

Additional  information  regarding  normative  data  for  the  WRAIR  PAB  tasks  is  provided  in 
Appendix  E,  which  contains  tabled  values  for  training,  baseline,  and  retest  trials  and  individual 
graphs  (mean,  median,  lower  quartile,  upper  quartile)  of  training  and  baseline  data  for  each  task. 

5.1.5  Performance  Percentile  Groupings 

Performance  percentile  groupings  were  calculated  for  each  task  within  each  battery.  These 
performance  percentile  groupings  provide  estimates  of  the  relevant  dependent  measures  for 
performance  categories  ranging  from  Very  Poor  to  Very  Good  in  20  percentile  increments.  These 
tables  may  be  of  particular  interest  to  those  researchers  who  wish  to  categorize  their  subjects 
(individuals  or  groups)  based  on  the  data  from  this  study.  Tables  14  and  15  present  the 
performance  percentile  groupings  for  the  STRES  battery  tasks.  The  performance  percentile 
groupings  for  the  CTS  tasks  are  presented  in  Table  16,  and  the  performance  percentile  groupings 
for  the  WRAIR  tasks  are  presented  in  Table  17.  A  comparison  of  the  current  data  for  the  STRES 
Reaction  Time  task  with  similarly  presented  data  in  AGARD  (1989)  reveals  that  the  current  sample 
of  subjects  had  faster  response  times. 
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WRAIR  Interval  Production 


Mean  interval  Length  for  WRAIR  PAB  Interval  Production  Task. 


WRAIR  Interval  Production 
Standard  Deviation 


Mean  Interval  Variability  for  WRAIR  PAB  Interval  Production  Task. 


Table  15.  STRES  Reaction  Time  Task  Performance  Percentile  Groupings. 
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Table  17.  WxAIR  PAB  Performance  Percentile  Groupings. 


5.2  Reliability  of  UTC-PAB  Measures 


The  testing  protocol  for  this  UTC-PAB  nonnative  database  project  provided  the  opportunity  to 
assess  reliability  at  a  variety  of  time  intervals.  As  noted  in  Section  4.1,  baseline  testing  for  all 
subjects  was  conducted  on  the  first  two  days  of  the  second  week.  Approximately  one  half  of  the 
subjects  (N  -  33)  returned  one  week  later  (five  days)  for  two  additional  days  of  baseline  retesting 
and  the  other  half  of  the  subjects  (N  »  31)  returned  three  weeks  (nineteen  days)  later  for  two 
additional  days  of  baseline  retesting.  These  various  testing  intervals  provided  numerous 
opportunities  for  assessing  test-retest  reliability.  For  example,  comparing  baseline  data  from  Day  1 
in  Week  2  to  baseline  data  from  Day  2  in  Week  2  provided  an  assessment  of  test-retest  reliability 
over  24  hours.  Because  most  of  the  STRES  battery  tasks  were  administered  twice  per  day, 
comparison  of  the  first  and  second  trials  on  Day  2  of  Week  2  provided  an  opportunity  to  assess 
retest  reliability  over  approximately  30  minutes.  An  assessment  of  retest  reliability  over 
approximately  one  week  was  possible  by  comparing  baseline  measurements  on  Day  2  of  Week  2 
with  baseline  measures  collected  during  the  first  retest  session  during  the  third  week  of  the  project. 
Finally,  test-retest  reliability  over  approximately  three  weeks  was  provided  by  comparing  baseline 
testing  on  Day  2  of  Week  2  with  the  first  day  of  retesting  during  Week  5. 

The  reliability  computations  were  supplemented  by  analyses  of  variance  to  identify 
performance  differences  across  trials.  These  analyses  are  summarized  in  Section  5.2.4. 

5.2,1  STRES  Battery  Reliability 

Because  the  testing  protocol  allowed  for  the  collection  of  two  trials  per  day  on  many  of  the 
STRES  tasks,  retest  reliability  estimates  for  30  minutes  were  calculated  for  these  tasks  in  addition 
to  the  24  hour,  one  week,  and  two  week  intervals  reported  for  other  tasks  in  this  project.  Table  18 
presents  Pearson  product-moment  correlations  between  the  second  baseline  testing  day  (first  trial) 
and  baseline  testing  days  at  each  of  the  four  time  intervals  outlined  above.  Some  general  trends 
emerge  from  these  data.  First,  a  number  of  tasks  appear  to  have  considerable  reliability  with 
regard  to  their  mean  response  time  measure.  These  include  Grammatical  Reasoning,  Mathematical 
Processing,  and  Spatial  Processing.  Reliability  for  the  response  time  measures  of  these  tasks  fall 
well  within  the  acceptable  range  for  reliabilities,  (i.e.,  0.80  to  0.90).  The  reliabilities  for  the 
response  cime  measures  of  the  two  versions  of  the  STRES  Memory  Search  (Sternberg)  task  fall  in 
a  somewhat  marginal  category.  These  reliabilities  in  the  0.70  to  0.80  range  are  marginal,  yet 
encouraging,  especially  when  considering  these  are  performance  task  reliabilities.  However,  they 
are  not  as  strong  as  would  be  desired,  probably  due  to  the  relatively  easy  nature  of  the  task  such 
that  the  peiformance  scores  of  most  subjects  are  closely  grouped. 
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Table  18.  STRES  Battery  Test-Retest  Correlations  with  Baseline  Day  2. 


STRES  Task  Measure 

Retest 

Interval 

30-minute 

24-hour 

1-week 

3-week 

GRM  Response  Time 

mm 

mm 

0.86 

GRM  Percentage  Correct 

1 

0.20 

MTH  Response  Time. 

f 

mm 

0.84 

0.83 

MTH  Percentage  Correct 

— 

— 

0.69 

0.48 

STN2  Response  Time 

mm 

mm 

0.78 

STN2  Percentage  Correct 

— 

0.58 

STN4  Response  Time 

mm 

0.82 

STN4  Percentage  Correct 

0.42 

SPA  Response  Time 

SB 

mm 

mm 

1  1 

SPA  Percentage  Correct 

■■ 

Bn 

TRK  Edge  Violations 

■SOI 

0.26 

TRK  RMS  Error 

mSfiM 

1 

0.74 

iKBil 

In  general,  the  reliabilities  for  the  percentage  correct  measures  for  all  of  the  above  mentioned 
tasks  fall  in  an  unacceptable  category.  This  problem  of  uniformly  low  reliabilities  is  most  likely 
explained  by  the  fact  that  the  percentage  correct  measure  is  subject  to  an  extreme  ceiling  effect  for 
most  tasks.  Due  to  the  ceiling  effect,  so  little  variability  is  retained  in  this  measure  that  reliability 
clearly  becomes  compromised. 

The  reliability  figures  for  the  STRES  Unstable  Tracking  task  are  generally  quite  low.  The 
reliability  figures  for  Edge  Violations  are  generally  in  the  unacceptable  category  and  can  be 
explained  by  the  fact  that  very  few  Edge  Violations  occur  beyond  the  first  few  training  trials. 
Again,  a  lack  of  variability,  in  this  case  a  floor  effect,  precludes  effective  reliability  measurement. 
The  RMS  Error  value,  while  retaining  more  variability,  does  not  provide  impressive  reliability 
indices.  The  reliability  of  this  measure  is  moderate  for  fairly  recent  periods  up  to  one  week,  but 
appears  to  drop  off  considerably  at  the  three- week  testing  interval. 

It  should  aiso  be  noted  that  for  many  of  these  measures,  reliability  remains  fairly  constant 
across  all  testing  intervals.  Usually,  this  suggests  that  acceptably  stable  performance  can  be 
expected  for  task  performance  up  to  a  three-week  testing  interval. 
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Table  19.  STRES  Reaction  Time  Task  Test-Retest  Correlations. 


Block 

Task  Measure 

Retest  Interval 

24-hour 

1-week 

3-week 

1  -  BASIC 

Response  Time 

I 

mm 

■■ 

Percentage  Correct 

1 

HOI 

6  -  BASIC 

Response  Time 

0.84 

0.76 

BB 

Percentage  Correct 

0.60 

0.61 

mm 

2 -CODED 

Response  Time 

mm 

0.91  1 

Percentage  Correct 

1 

3  - UNCERT 

Response  Time 

■■ 

0.63 

1 

Percentage  Correct 

0.00 

1 

4 - DOUBLE 

Response  Time 

mm 

m 

mm 

Percentage  Correa 

■ 2QB 

5  -  INVERT 

Response  Time 

0.87 

0.91 

mm 

Percentage  Correct 

0.58 

■I 

Table  19  presents  the  reliability  indices  for  the  various  forms  of  the  STRES  Reaction  Time 
task.  Because  the  STRES  Reaction  Time  task  was  completed  only  once  per  day,  the  calculation  of 
a  thirty-minute  test-retest  reliability  was  not  possible.  Again,  quite  acceptable  reliability  indices 
were  obtained  for  the  response  time  measure  for  each  of  the  STRES  Reaction  Time  task  forms. 
The  one  possible  exception  was  the  time  uncertainty  form  (3  -  UNCERT)  which  provided 
somewhat  marginal  reliability  indices.  These  reliabilities  appeared  to  remain  quite  stable  across  the 
24-hour,  one-week,  and  three-week  testing  intervals. 

Reliability  indices  for  the  STRES  Reaction  Time  task  percentage  correct  measures  were 
uniformly  poor.  There  was  considerable  deviation  in  these  indices,  with  some  providing  reliability 
estimates  as  high  as  0.70  to  0.80  and  others  as  low  as  near  zero.  Again,  ceiling  effects  in  the 
percentage  correct  measure  preclude  any  adequate  measure  of  reliability. 

5.2.2  CTS  Reliability 

Table  20  presents  the  reliability  indices  for  the  various  CTS  tasks.  The  response  time  measure 
for  Grammatical  Reasoning,  Mathematical  Processing,  Memory  Search,  and  Spatial  Processing 


76 


provided  a  high  degree  of  reliability  across  all  three  testing  intervals.  The  reliability  figures  for  the 
percentage  correct  measures  for  these  tasks  were  again  quite  low  for  the  same  reasons  mentioned 
previously  in  the  STRES  tasks  analysis. 


Table  20.  CTS  Test-Retest  Correlations  with  Baseline  Day  2. 


CTS  Task  Measure 

Retest  Interval 

24-hour 

1-week 

3-week 

GR 

Response  Time 

MEM 

MEM 

mm 

GR 

Percentage  Correct 

— 

■H 

MP 

Response  Time 

1 

mm 

0.77 

MP 

Percentage  Correct 

1W 

0.41 

MS 

Response  Time 

0.81 

0.88 

MS 

Percentage  Correct 

0.32 

■Mil 

0.34 

SP 

Response  Time 

■SB 

mm 

mm 

SP 

Percentage  Correct 

1 

KH 

BREW 

UT 

Edge  Violations 

0.77 

0.10 

mm 

UT 

RMS  Error 

0.85 

0.70 

mssm 

The  24-hour  test-retest  reliability  for  the  RMS  Error  measure  of  the  CTS  Unstable  Tracking 
task  fell  in  the  acceptable  range.  Subsequent  reliabilities  for  one-  and  three-week  intervals  were 
marginally  acceptable.  The  24-hour  reliability  index  for  Edge  Violations  was  marginally 
acceptable.  However,  the  reliabilities  for  subsequent  time  intervals  were  unacceptably  low. 

5.2.3  WRAIR  PAB  Reliability 

The  reliabilities  for  the  WRAIR  PAB  Tasks  are  presented  in  Table  21.  The  correlations  for  the 
mean  response  time  for  correct  items  in  the  WRAIR  PAB  Manikin  task  suggest  a  high  degree  of 
reliability  across  the  various  retest  intervals.  The  reliabilities  for  the  percentage  correct  measure  are 
unacceptably  low,  undoubtedly  due  again  to  ceiling  effects. 

Reliability  figures  for  the  mean  time  estimation  for  the  WRAIR  PAB  Time  Wall  task  are 
generally  quite  acceptable  although  the  three-week  reliability  is  somewhat  low.  Reliabilities  for  the 
standard  deviation  measure  are  generally  quite  low.  However,  this  measure  is  generally  of  less 
utility. 
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Table  21.  WRAIR  PAB  Test  Retest  Correlations  with  Baseline  Day  2. 


Task 

Task  Measure 

Retest  Interval 

24-hour 

l-week 

3-week 

Manikin 

Response  Time 

0.89 

0.89 

0.95 

Percentage  Correct 

0.37 

0.58 

0.03 

Time  Wall 

Interval  Mean 

MEM 

MEM 

■ 

Interval  Std.  Dev. 

Wmm 

Interval  Prod 

Interval  Mean 

1 

■ 

MEM 

Interval  Std.  Dev. 

WBM 

Reliabilities  for  both  mean  interval  and  standard  deviation  for  the  WRAIR  PAB  Interval 
Production  task  are  generally  quite  low.  This  probably  is  not  surprising  given  the  very  small 
interval  of  time  that  the  subjects  are  asked  to  produce.  In  the  WRAIR  PAB  Time  Wall  task  the 
subject  is  basically  being  asked  to  estimate  over  a  10-second  time  interval.  In  the  WRAIR  PAB 
Interval  Production  task,  the  subject  is  being  asked  to  estimate  one-second  intervals.  At  one  level, 
these  data  suggest  that  time  estimation  may  decrease  in  reliability  as  the  standard  time  interval  being 
estimated  decreases.  Alternatively,  the  visual  component  of  the  Time  Wall  task  may  provide 
sufficient  cueing  to  distinguish  this  from  being  strictly  a  time  estimation  task.  Some  subjects  may 
consistently  under-estimate  the  time  while  others  consistently  over-estimate  the  time,  thus  leading 
to  a  higher  reliability. 

5.2.4  Reliability  Summary 

The  reliabilities  for  tasks  with  response  time  measures  were  generally  acceptable  and  in  many 
cases  fairly  impressive.  These  reliabilities  were  sustained  across  time  intervals  from  30  minutes  to 
three  weeks.  Due  to  ceiling  effects,  accuracy  measures  such  as  percentage  correct  appeared  to  have 
very  poor  reliability.  However,  the  percentage  correct  measures  could  increase  in  reliability  if  the 
testing  conditions  were  changed  to  increase  the  difficulty  of  the  task  either  through  environmental 
or  task  manipulations.  Of  the  various  tasks  evaluated  in  this  project,  the  WRAIR  PAB  Interval 
Production  Task  provided  the  least  evidence  of  reliability.  No  reliability  index  of  any  measure  in 
any  testing  interval  exceeded  0.58  for  this  task.  Also,  of  some  concern  are  the  measures  of 
tracking  ability  from  both  the  STRES  and  CTS  batteries.  Edge  violations  are  generally  unreliable 
in  a  traditional  sense  due  to  the  floor  effect  and  RMS  Error  values  are  reasonably  reliable  on  the 
CTS  Unstable  Tracking  task,  but  only  marginally  so  on  the  STRES  battery  version  of  this  task. 


5.3  Statistical  Analysis  of  Trial  Differences 

Analysis  of  variance  was  used  to  determine  the  statistical  significance  of  any  performance 
differences  that  existed  across  trials  for  three  situations.  The  first  situation  was  to  verify  the 
existence  of  a  training  effect  The  second  series  of  analyses  identified  differences  across  the  two 
(CTS,  STRES  RCT,  WRAIR  PAB)  or  four  (STRES)  Baseline  trials.  The  third  series  examined 
differences  in  performance  between  Baseline  trials  and  one-week  or  three-week  retest  trials.  With 
respect  to  the  third  series,  this  type  of  analysis  helps  tc  determine  if  performance  has  remained 
stable  even  when  the  test-retest  correlations  are  low  for  other  reasons.  In  many  ways,  this 
approach  is  more  appropriate  when  measures  exhibit  either  ceiling  or  floor  effects  and  is  also 
appropriate  in  other  situations. 

5.3.1  Training  Trials 

With  few  exceptions,  there  were  significant  improvements  in  performance  across  training 
trials,  with  the  greatest  changes  occurring  between  Trial  1  and  Trial  2.  For  this  reason,  Trial  was 
included  as  a  within-subjects  factor  in  all  ANOVA's  used  to  investigate  the  various  issues  of 
concern  (i.e.,  test  administration,  battery  sequence,  etc.).  The  only  task  measures  which  did  not 
demonstrate  a  statistically  significant  training  effect  were  (1)  STRES  Memory  Search  percentage 
correct  at  both  the  two-character  (p  =  0.67)  and  four-character  (p  =  0.08)  levels,  (2)  CTS  Memory 
Search  percentage  correct  (p  =  0.94),  (3)  WRAIR  PAB  Time  Wall  mean  interval  (p  -  0.23),  and 
(4)  WRAIR  PAB  Interval  Production  standard  deviation  (p  =  0.29).  These  results  would  be 
expected  in  the  case  of  the  Memory  Search  task  on  which  subjects  provide  a  high  level  of  accuracy 
starting  with  the  very  first  trial,  and  demonstrate  performance  improvement  on  the  task  by 
providing  faster  responses  with  no  change  in  accuracy.  The  WRAIR  PAB  Time  Wall  and  Interval 
Production  tasks  in  some  sense  measure  an  inherent  skill  that  does  not  improve  with  practice. 

For  most  tasks  in  this  project,  performance  appeared  to  reach  asymptotic  levels  by 
approximately  the  third  day  of  practice.  However,  it  should  be  noted  that  this  observation  is  based 
largely  on  those  tasks  represented  in  both  the  STRES  and  CTS  batteries,  between  which  some 
transfer  of  skill  may  have  occurred  during  training.  Thus,  learned  in  isolation,  each  of  these  tasks 
may  require  more  practice  to  attain  stability.  Estimates  from  an  earlier  CTS  normative  study 
suggest  four  to  five  trials  of  practice  is  the  minimum  needed  to  obtain  reasonably  stable 
performance  (see  Schlegel  and  Gilliland,  1990).  Mo*e  practice  is  needed  on  what  appears  to  be  the 
more  difficult  tasks  (i.e.,  Grammatical  Reasoning,  Unstable  Tracking,  etc.).  These  tasks  appeared 
to  require  at  least  five  trials  of  practice. 
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5.3.2  Baseline  Trials 

A  summary  of  the  few  statistically  significant  baseline  trial  effects  is  presented  in  Table  22.  Of 
the  forty  measures  analyzed,  only  nine  demonstrated  some  statistically  significant  variation  across 
the  multiple  baseline  sessions.  Keep  in  mind  that  a  collection  of  univariate  ANOVA's  was 
performed  with  no  project-wise  control  of  the  Type  I  error  level.  Thus,  one  would  expect  some 
significant  results  by  chance.  In  general,  for  those  measures  where  a  difference  was  evident, 
Baseline  Day  2  exhibited  better  performance  than  Baseline  Day  1.  On  the  STRES  tasks,  the  first 
trial  of  Baseline  Day  2  exhibited  better  performance.  A  similar  though  not  statistically  significant 
trend  existed  with  other  tasks,  indicating  the  small  but  continuing  effect  of  learning. 


Table  22.  Summary  of  Significant  Baseline  Trial  Differences. 


Task  Measure 

p  >F 

Tukey  Test  (.01)* 

(improved  performance  ->) 

STRES-GRM  Mean  RT 

0.0001 

Bib 

Bla 

J52|^ 

B2a 

STRES-MTH  Mean  RT 

0.0001 

Bla 

Bib 

B2b 

B2a 

STRES-MTH  Proportion  Correct 

0.0498 

B2b 

JJl^ 

J51t^ 

B2a 

STRES-SPA  Mean  RT 

0.0061 

Bib 

Bla 

JJ2b_ 

B2a 

STRES -SPA  Proportion  Correct 

0.0001 

Bla 

Bib 

B2b 

B2a 

STRES-RCT  BASIC  Block  1  RT 

0.0009 

B1 

B2 

STRES-RCT  DOUBLE  Block  4  RT 

0.0022 

B2 

B1 

WRAIR-MAN  Mean  RT 

0.0074 

B1 

B2 

WRAIR-TIM  Standard  Deviation 

0.0166 

B2 

B1 

5.3.3  Retest  Trials 

Figures  18  and  19  present  typical  results  from  the  one-week  and  three- week  retest  trials.  An 
initial  statistical  comparison  of  the  performance  of  the  two  subject  groups  using  only  the  training 
and  baseline  data  confirmed  that  there  were  no  significant  differences  between  the  groups  with 
respect  to  any  of  the  task  measures.  Having  confirmed  the  comparability  of  the  two  subject 
groups,  separate  analyses  of  variance  were  conducted  to  compare  baseline  and  retest  performance 
for  the  one-week  retest  group  and  the  three-week  retest  group.  The  number  of  statistically 
significant  differences  were  few  and  often  revealed  greater  variability  among  the  baseline  trials 
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STRES  Grammatical  Reasoning 
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Figure  18.  One-Week  and  Three-Week  Retest  Data  for  STRES  Grammatical  Reasoning  Response  Time. 


STRES  Mathematical  Processing 
Mean  Response  Time 
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Figure  19.  One- Week  and  Three-Week  Retest  Data  for  STRES  Mathematical  Processing  Response  Time. 


themselves  than  between  the  baseline  and  retest  trials.  In  general,  when  differences  occurred,  the 
first  retest  trial  demonstrated  slightly  poorer  performance  than  baseline  but  the  second  retest  trial 
indicated  better  performance  than  baseline,  indicating  that  subjects  recovered  quickly  and  perhaps 
continued  to  improve  with  only  one  additional  practice  trial.  Graphs  comparing  baseline,  one- 
week,  and  three-week  performance  for  all  tasks  are  provided  in  Appendix  F. 

5.4  Comparison  of  Similar  Tasks  Across  Batteries 

Due  to  the  fact  that  each  daily  test  period  consisted  of  one  session  on  the  CTS  tasks  but  two 
sessions  on  the  comparable  STRES  tasks,  caution  must  be  exercised  in  making  an  appropriate 
comparison  of  the  two  batteries.  As  mentioned  previously,  it  is  unclear  to  what  extent  training  on 
the  CTS  transferred  to  the  STRES  and  vice-versa.  An  approximately  equal  number  of  subjects 
performed  the  batteries  in  the  order  CTS-STRES  (37)  and  STRES-CTS  (42)  in  an  attempt  to 
balance  the  transfer  effect. 

Although  one  may  argue  differently,  the  authors  believe  that  the  most  appropriate  approach  is 
to  compare  CTS  and  STRES  performance  on  a  block-by-block  basis  regardless  of  the  day  on 
which  the  data  were  obtained.  This  w'ould  allow  the  seven  blocks  of  CTS  data  (five  training  days 
plus  two  baseline  days)  to  be  compared  with  the  first  seven  blocks  of  STRES  data  (first  three  and  a 
half  days  of  training).  Comparisons  of  this  nature  are  provided  for  the  four  discrete  response  tasks 
(GR/GRM,  MP/MTH,  MS/STN4,  and  SP/SPA)  in  Figure  20  (Mean  Response  Time)  and  Figure 
21  (Percentage  Correct).  The  comparison  for  the  tracking  tasks  (UT/TRK)  is  presented  in  Figure 
22.  Individual  graphs  for  each  task  using  expanded  scales  are  provided  in  Appendix  G. 

It  is  clear  that  there  exists  very  good  correspondence  between  the  CTS  and  STRES 
implementations  of  most  tasks.  Where  minor  differences  exist,  accompanying  distinctions  in  task 
implementation  can  be  readily  found  to  explain  the  differences.  A  discussion  of  each  task  follows. 

For  the  Grammatical  Reasoning  task,  there  was  no  statistically  significant  difference  between 
the  batteries  for  the  RT  measure.  CTS  GR  had  a  slightly  higher  initial  RT,  but  this  difference  did 
not  persist  beyond  Block  1.  There  was  a  statistically  significant  difference  in  Percentage  Correct 
(^1,78  =  11.99,p  =  0.0009)  with  the  CTS  GR  yielding  slightly  better  performance.  One  notable 
difference  in  implementations  is  the  use  of  the  words  PRECEDES  and  FOLLOWS  in  CTS  vs. 
BEFORE  and  AFTER  in  STRES.  This  difference  may  have  caught  subjects  off  guard  on  the  first 
CTS  trial.  For  the  Mathematical  Processing  task,  there  were  absolutely  no  performance  differences 
for  either  RT  or  PC.  The  STRES  and  CTS  implementations  of  these  tasks  are  essentially  identical 
with  only  minor  differences  of  screen  character  size  and  response  device. 
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STRES  and  CTS  Tasks 
Mean  Response  Time 


Figure  20.  Mean  Response  Time  Comparison  of  STRES  and  CTS  Discrete  Response  Tasks, 


SIRES  and  CTS  Tasks 
Percentage  Correct 


Figure  21 .  Mean  Percentage  Correct  Comparison  of  STRES  and  CTS  Discrete  Response  Tasks. 


STRES  and  CTS  Unstable  Tracking 


86 


Figure  22.  Comparison  of  STRES  and  CTS  Unstable  Tracking  Performance  Measures. 


Response  times  on  the  CTS  4-character  Memory  Search  task  averaged  (60  msec)  slower  than 
on  the  STRES  version  =  88.80,  p  =  0.0001)  across  all  blocks.  This  statistically  significant 
difference  is  small  and  only  noticeable  when  graphed  on  an  expanded  scale  (Appendix  G). 
However,  it  is  consistent  and  may  represent  differences  in  the  hardware  characteristics  (response 
keypad  vs.  keyboard)  or  software  features  (display  scanning,  timing  routines)  of  the  two 
implementations.  A  statistically  significant  difference  in  the  Percentage  Correct  measure  (F i;/8  = 
15.62,  p  =  0.0002)  was  of  no  practical  significance  (CTS  -  98.2%  vs.  STRES  -  97.5%). 

For  Spatial  Processing,  the  CTS  implementation  yielded  faster  RT's  (Fijs  =  249.17,  p  = 
0.0001).  This  is  likely  to  be  attributable  to  some  combination  of  (1)  differences  in  the  size  and 
spacing  of  the  histogram  bars,  (2)  filled  (CTS)  vs.  outlined  unfilled  (STRES)  bars,  and  (3) 
implementation  differences  in  terms  of  what  constitutes  a  comparison  stimulus  that  is 
"DIFFERENT"  from  the  standard  stimulus.  In  STRES,  a  difference  of  a  single  unit  for  one  bar  is 
sometimes  the  only  difference  between  the  standard  and  comparison  stimuli  whereas  in  the  CTS, 
the  difference  is  more  noticeable  on  all  trials.  Note  that  there  is  no  difference  in  the  percentage 
correct  measure  for  the  two  implementations. 

Examining  only  the  University  of  Oklahoma  data,  substantial  differences  existed  in  the 
performance  measures  for  the  two  tracking  tasks  with  the  STRES  version  yielding  better 
performance  in  terms  of  Edge  Violations  (Fi,62  =  42.44,  p  =.0.0001)  and  RMS  Error  (F'i(62  = 
123.57,  p  ~  0.0001),  These  differences  exhibit  both  statistical  and  practical  significance  and  point 
to  the  difficulty  of  implementing  a  continuous  analog  task  on  two  different  systems.  Here  again, 
the  differences  are  likely  to  be  a  result  of  hardware  differences  (rotary  controller  vs.  joystick, 
analog-to-digital  converter  characteristics)  and  software  differences  (display  resolution  and 
appearance,  control  loop  programming  and  gain).  It  is  not  clear  how  this  problem  could  be 
remedied  or  how  a  system  could  be  calibrated  to  yield  performance  consistent  with  performance  on 
other  systems. 

An  a’remative  method  of  presenting  the  CTS-STRES  comparison  is  to  plot  data  from  all 
training  and  baseline  blocks  on  a  day-by-day  basis.  Graphs  plotted  in  this  fashion  are  also 
provided  in  Appendix  G.  Although  the  correspondence  between  the  CTS  and  STRES 
implementations  is  less  clear  with  this  approach,  these  presentations  lead  to  conclusions  similar  to 
those  discussed  in  the  preceding  paragraphs.  Statistical  analyses  corresponding  to  this  presentation 
method  involved  comparing  CTS  performance  with  the  average  of  the  two  STRES  blocks  on  a 
day-to-day  basis. 
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With  this  approach,  the  only  task  which  did  not  yield  significant  battery  differences  was 
Mathematical  Processing.  Performance  on  CTS  GR  averaged  295  msec  slower  (Fijg  =  29.90,  p 
■  0.0001)  and  slightly  less  accurate  (Fijg  »  4.14,  p  ■  0.045).  CTS  MS  was  60  msec  slower  with 
1%  higher  accuracy.  CTS  SP  averaged  200  msec  faster  with  1%  lower  accuracy. 

As  found  in  the  block-by-block  analysis,  the  two  measures  for  tracking  indicated  significant 
implementation  differences  with  the  STRES  version  remaining  the  easier  task.  On  the  average,  die 
CTS  version  yielded  more  than  8  Edge  Violations  per  trial  compared  with  less  than  one  EV  for  the 
STRES  version. 

In  summary,  Mathematical  Processing  performance  was  quite  similar  for  both 
implementations,  Memory  Search  was  slightly  (60  msec)  yet  significantly  faster  in  the  STRES 
implementation,  and  Spatial  Processing  was  significantly  slower  in  the  STRES  implementation. 
The  Percentage  Correct  measure  did  not  differ  appreciably  between  implementations  for  any  of  the 
discrete  tasks.  Unstable  Tracking  performance  was  substantially  better  in  the  STRES  version. 

A  final  session-by-session  comparison  of  the  CTS  data,  data  from  the  first  STRES  block  and 
data  from  the  second  STRES  block  each  day  shows  a  consistent  improvement  from  the  first  to  the 
second  STRES  block  with  die  CTS  better,  worse,  or  the  same  as  discussed  previously. 

It  must  be  emphasized  that  due  to  the  relatively  large  number  of  subjects  tested  and  the 
corresponding  high  statistical  power,  the  statistical  significance  of  the  differences  between  the 
batteries  far  outweighs  their  practical  significance.  The  differences  in  RT  ranged  from  60  to  300 
msec  with  typically  less  than  a  1%  difference  in  accuracy.  These  differences  represent  a  range  that 
is  substantially  less  than  the  improvements  that  take  place  from  trial  to  trial  over  the  course  of 
training.  In  essence,  the  agreement  between  the  batteries  is  substantial.  This  is  further  illustrated 
by  the  correlation  coefficients  presented  in  Tables  23  and  24. 

A  comparison  of  the  current  CTS  and  STRES  database  with  CTS  data  from  a  previous  large- 
scale  normative  CTS  study  (Schlegei  and  Gilliland,  1990)  shows  substantial  agreement  for  some 
tasks  and  disagreement  for  others  (Appendix  H).  Tasks  showing  close  agreement  include 
Mathematical  Processing  and  Memory  Search. 

There  is  some  disagreement  for  the  Grammatical  Reasoning  and  Spatial  Processing  tasks 
probably  pointing  to  differences  in  the  test  samples  rather  than  structural  differences  in  the  tasks. 
This  is  probably  not  the  case  for  the  Unstable  Tracking  task  where  substantial  performance 
differences  exist  due  to  task  differences  (horizontal  vs.  vertical  tracking),  software  changes  in  the 
timing  loop  subroutine  and  controller  characteristics  (potentiometer  model  and  wear). 
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Table  23,  Corrdatioo  between  CTS  and  STRES  Measures  by  Block. 


(OKLA)  j  FUS  0.53  0.59  0.68  0.55  0.32  0.57  0.41 


Table  24.  Correlation  between  CTS  and  STRES  Measures  by  Day. 


(OKLA)  I  RMS  0.57  0.68  0.74  0.68  0.62  0.78  0.62 


5.5  Effect  of  Group  vs.  Individual  Testing  Procedures 

Group  versus  individual  testing  procedure  effects  were  examined  by  comparing  the 
performance  of  the  Armstrong  Laboratory  subjects,  who  were  tested  individually,  to  that  of  the 
University  of  Oklahoma  subjects,  who  were  tested  in  groups.  Because  the  University  of 
Oklahoma  subjects  were  tested  in  two  phases  (cycles)  these  subjects  were  divided  into  a  Phase  1 
group  and  a  Phase  2  group  for  the  purposes  of  this  analysis.  Figure  23  presents  representative 
data  plotted  by  testing  group  —  in  this  case,  the  mean  response  time  variable  for  the  STRES 
Mathematical  Processing  task.  As  with  most  other  dependent  variables,  there  is  extremely  good 
correspondence  among  the  groups  with  regard  to  the  data  plots  across  testing  sessions.  Similar 
graphs  for  each  dependent  measure  for  the  tasks  of  the  STRES  battery,  CTS,  and  WRAIR  PAB 
are  located  in  Appendix  I. 

The  analyses  conducted  to  examine  the  group  versus  individual  testing  procedures  effect 
consisted  of  a  two-way  ANOVA  with  three  levels  of  testing  group  (Armstrong  Lab,  OU-Phase  1, 
and  OU-Phase  2)  and  Trials  (either  Baseline  or  Training)  conducted  for  each  dependent  variable. 
This  analysis  also  allowed  a  comparison  of  the  two  testing  cycles  at  the  University  of  Oklahoma, 
i.e.,  a  comparison  of  two  groups  of  subjects  tested  under  essentially  identical  conditions. 

Because  this  overall  analysis  included  numerous  tests,  it  would  have  been  appropriate  to 
adjust  the  experiment-wise  Type  I  error  rate  through  some  procedure  such  as  dividing  alpha.  Of 
course,  in  this  case,  where  one  might  hope  that  no  differences  would  emerge  among  the  subject 
groups,  Type  I  error  rate  control  procedures  make  the  detection  of  significant  differences  more 
difficult  --  and  thus,  work  in  favor  of  finding  fewer  significant  differences.  A  statistically  less 
conservative  approach  was  taken  in  the  analyses  of  this  project.  Alpha  was  set  at  p  £  0.01  to 
provide  a  conservative  Type  I  error  rate,  but  not  divided  further.  Thus,  the  probability  of 
identifying  significant  differences  was  sharply  increased.  This  less  conservative  approach  was 
considered  acceptable  given  that  even  small  differences  were  viewed  as  important 

The  analyses  of  Baseline  data  yielded  only  a  few  significant  differences.  First,  significant 
differences  with  respect  to  Testing  Procedure  were  found  for  both  Edge  Violations  (p  <  0.0001) 
and  RMS  Error  (p  <  0.0001)  on  the  CTS  Unstable  Tracking  task  (Figure  24).  For  both  measures, 
the  differences  revealed  that  the  Armstrong  Laboratory  group  differed  from  the  two  University  of 
Oklahoma  groups.  This  was  caused  by  the  controller  problem  discussed  in  Section  5.1.3  and 
resulted  in  dramatically  different  scores  for  the  Armstrong  Laboratory  subjects.  These  differences 
were  simply  a  result  of  this  equipment  problem. 
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SIRES  Mathematical  Processing 
Mean  Response  Time 
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Figure  23.  Group  vs.  individual  Test  Administration  Procedures  for  STRES  Math  Processing. 


CIS  Unstable  Tracking  CTS  Unstable  Tracking 

Edge  Violations  RMS  Error 


Only  two  additional  differences  were  found  and  these  were  marginally  significant.  The 
analyses  yielded  a  significant  difference  between  testing  groups  for  the  percentage  correct  measure 
of  the  STRES  Grammatical  Reasoning  task  (p  =  0.013)  and  for  the  percentage  correct  measure  of 
the  STRES  Spatial  Processing  task  (p  =  0.015).  Neither  of  these  analyses  actually  reached  the 
critical  p  value  and  subsequent  Tukey  multiple  comparison  tests  failed  to  detect  any  differences  at 
the  0.01  level.  Thus,  it  was  concluded  that  these  differences  were  inconsequential. 

A  second  similar  set  of  ANOVA  analyses  were  conducted  for  the  Training  data.  As  expected, 
these  analyses  also  yielded  significant  differences  for  Edge  Violations  (p  =  0.007)  and  RMS  Error 
( p  <  0.0001)  for  the  CTS  Unstable  Tracking  task  due  to  the  controller  problem.  Significant 
differences  were  also  found  for  two  STRES  Reaction  Time  task  forms.  Analyses  of  percentage 
correct  for  the  first  Basic  series  (p  =  0.0003)  yielded  a  significant  difference  and  the  analyses  of 
percentage  correct  for  the  final  Basic  series  (p  =  0.036)  yielded  a  marginally  significant  difference. 
Subsequent  Tukey  multiple  comparison  tests  yielded  no  differences  at  the  0.01  level,  but  did  yield 
significant  differences  at  alpha  =  0.05  between  the  Armstrong  Laboratory  subjects  and  the  Phase  1 
University  of  Oklahoma  group.  The  Phase  2  subject  performance  was  closer  to  that  of  the 
Armstrong  Laboratory  subjects.  This  difference  was  traced  to  a  procedural  difference  in  which 
Phase  1  subjects  at  the  University  of  Oklahoma  had  not  been  exposed  to  the  Basic  Reaction  Time 
task  during  the  Orientation  session.  While  their  data  were  affected  on  the  first  training  day,  this 
difference  did  not  exist  on  subsequent  days. 

The  only  other  differences  that  were  identified  by  the  ANOVA  analyses  were  marginally 
significant  differences  for  the  percentage  correct  measure  of  the  STRES  Spatial  Processing  task 
(p  =  0.02)  and  the  RMS  Error  measure  for  the  STRES  Unstable  Tracking  task  (p  =  0.05). 
Subsequent  Tukey  multiple  comparison  tests  failed  to  yield  any  significant  differences  at  the  0.01 
level.  Thus,  these  findings  were  viewed  as  inconsequential.  These  analyses  suggest  that  the  type 
of  training  that  subjects  received  had  no  influence  on  performance.  That  the  analyses  did  detect 
two  training/testing  related  problems  (i.c.,  controller  failure  and  missed  orientation)  indicates  that 
this  analysis  was  sensitive  to  variables  that  would  affect  performance.  However,  there  was  no 
evidence  for  an  individual  vs.  group  training  effect  of  the  type  investigated  in  this  project 

5.6  Effects  of  Task  Order  and  Battery  Sequence 

While  prudence  would  always  suggest  counterbalancing  the  presentation  of  tasks  or  batteries, 
certain  environmental,  time,  or  resource  constraints  can  often  partially  or  completely  compromise 
this  safeguard.  In  such  cases  where  counterbalanced  presentation  is  difficult  or  impossible,  it  may 
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be  helpful  to  understand  more  accurately  the  potential  of  such  variables  to  influence  performance 
data. 


As  noted  in  Section  4.6.4,  four  battery  sequences  and  four  task  sequences  were  used  by 
different  groups  of  subjects  to  examine  the  effects  that  the  order  of  presentation  of  batteries  and 
tasks  within  batteries  might  have  on  task  performance.  Two  general  analyses  were  performed  to 
investigate  these  questions.  The  first  analysis,  performed  on  the  Baseline  data,  consisted  of  a  set 
of  two-way  ANOVA's  for  Battery  Sequence  (four  levels)  and  another  set  for  Task  Sequence  (four 
levels).  As  with  the  testing  group  analysis,  Trial  was  included  as  a  within-subjects  factor.  Also, 
the  same  approach  was  used  with  respect  to  the  establishment  of  the  Type  I  error  level.  Alpha  was 
set  at  p  <,  0.01  to  provide  a  conservative  Type  I  error  rate,  but  not  divided  further.  Thus,  the 
probability  of  detecting  significant  differences  was  sharply  increased,  but  this  less  conservative 
approach  was  considered  acceptable  given  that  even  small  differences  due  to  Battery  Sequence  or 
Task  Order  were  viewed  as  important. 

Figure  25  provides  a  representative  graph  of  task  performance  data  plotted  by  Battery 
Sequence  group.  As  can  be  seen,  the  four  sequences  corresponded  exceedingly  well  across  all 
testing  sessions.  Additional  graphs  for  each  dependent  measure  for  each  task  battery  plotted  by 
Battery  Sequence  group  are  located  in  Appendix  J.  The  analysis  of  Battery  Sequence  across  all 
sessions  (i.e.,  training  and  baseline)  yielded  no  significant  differences  at  alpha  =  0.01.  Three 
analyses  (mean  response  time,  p  =  0.04,  and  percentage  correct,  p  =  0.03,  for  the  STRES 
Mathematical  Processing  task,  and  the  mean  response  time,  p  =  0.04,  for  the  WRAIR  PAB 
Manikin  task)  yielded  differences  that  only  met  traditional  levels  of  significance  (i.e.,  alpha  = 
0.05).  Subsequent  Tukey  multiple  comparison  tests  failed  to  yield  any  significant  differences  even 
at  the  p  £  0.05  level. 

Figure  26  presents  representative  data  from  the  CTS  Mathematical  Processing  task  plotted  by 
Task  Order  group.  As  with  the  Battery  Sequence  groups,  there  is  remarkable  correspondence 
among  the  Task  Order  groups.  Additional  graphs  for  each  dependent  measure  for  the  tasks  in  the 
STRES  battery  and  the  CTS  plotted  by  Task  Order  group  are  located  in  Appendix  K.  The  analyses 
for  Task  Order  yielded  absolutely  no  significant  difference  for  any  variable  for  any  task.  No 
analyses  were  even  marginally  significant. 

Together,  these  analyses  revealed  no  evidence  of  Battery  Sequence  or  Task  Order  effects  on 
the  dependent  measures  of  the  task  batteries.  While  counterbalancing  is  always  recommended, 
these  data  suggest  that  in  cases  where  it  is  not  possible,  Battery  Sequence  and  Task  Order  effects 
for  the  tasks  assessed  in  this  project  may  not  pose  serious  threats. 
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Figure  25.  Battery  Sequence  Effects  for  CTS  Mathematical  Processing. 
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Figure  26.  Task  Order  Effects  for  CTS  Mathematical  Processing. 


5.7  Effects  of  Imposing  Response  Deadlines 

Figure  27  presents  an  example  of  the  effect  of  imposing  deadlines  on  the  responses  to  discrete 
tasks,  in  this  case  the  Grammatical  reasoning  task  from  the  STRES  battery.  The  term  "actual" 
deadline  refers  to  the  situation  when  an  actual  deadline  was  imposed  by  blanking  the  CRT  display 
and  locking  out  any  response  after  the  deadline.  For  "pseudo"  deadlines,  the  subjects  were 
presented  with  instructions  indicating  the  level  of  deadline  imposed,  but  were  not  actually  restricted 
in  their  responses  any  differently  than  in  the  no  deadline  condition. 

The  first  and  most  obvious  conclusion  is  that  an  actual  deadline  hastened  the  subject's 
responses  (or  did  not  count  the  slower  responses  as  acceptable)  and  resulted  in  a  shorter  mean 
response  time  for  correct  responses.  The  magnitude  of  the  speed  increase  is  related  to  the  severity 
of  the  deadline.  Accompanying  the  increase  in  speed  is  an  increase  in  the  number  of  totally  missed 
responses,  and  therefore  a  decrease  in  the  percentage  correct.  What  may  be  amazing  is  the 
apparent  fact  that  the  use  of  a  pseudo  deadline  has  a  similar  impact  in  thr*.t  response  time  decreases 
to  comparable  levels.  However,  because  there  is  no  actual  time-out  on  the  responses,  there  are 
fewer  missed  responses,  and  the  percentage  correct  is  essentially  the  same  as  for  the  No  Deadline 
condition. 

This  is  particularly  apparent  when  examining  the  last  series  of  trials  where  each  subject, 
regardless  of  the  deadline  under  which  training  was  conducted  produced  a  similar  pattern.  Both 
response  time  and  percentage  correct  decreased  with  stricter  actual  deadlines.  Under  the  pseudo 
deadline  conditions,  however,  subjects  performed  faster  with  no  apparent  loss  in  accuracy.  This 
finding  is  important  in  terms  of  developing  techniques  to  maximize  subject  performance,  but  adds 
another  complication  to  determining  the  reliability  of  the  tasks.  A  similar  pattern  of  results  was 
obtained  with  the  other  four  STRES  tasks  used  in  the  Deadline  Study. 
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5.ff  Effects  of  Extended  Trial  Length 


The  data  from  the  extended  trial  analyses  provide  the  opportunity  to  explore  the  influence  of 
this  variable  only  at  a  rudimentary  level.  The  fact  that  (1)  protocol  limitations  did  not  allow  all 
combinations  of  the  tasks  to  be  offered,  and  (2)  not  all  trial  lengths  could  be  randomly  presented, 
may  have  compromised  the  full  generalizability  of  these  results.  In  fact,  a  visual  examination  of 
the  data  suggest  that  in  some  cases  the  6-minute  trial  length  data  may  have  been  highly  affected  by 
such  factors.  It  also  appears  that  the  24-minute  data  probably  provided  the  best  opportunity  to 
explore  trial  length  effects.  Nonetheless,  the  data  did  piovide  a  preliminary  opportunity  to  examine 
the  influence  of  this  variable. 

As  was  noted  in  Section  4.0  (Project  Design  and  Method),  the  extended  trial  length  data  were 
collected  on  a  limited  set  of  STRES  battery  tasks  only.  In  addition,  analysis  priority  was  given  to 
the  STRES  Unstable  Tracking  task  due  to  the  continuous  nature  of  the  task.  It  was  hypothesized 
that  a  task  that  is  relatively  continuous  in  nature  would  not  provide  brief  periods  between  discrete 
trials  that  could  be  utilized  as  rest  pauses.  This  constant  demand  for  performance  would  more 
likely  provide  the  opportunity  to  observe  any  effects  of  extended  trial  length  on  performance.  For 
this  reason,  the  discussion  of  extended  trial  length  effects  will  focus  on  the  STRES  Unstable 
Tracking  data.  However,  a  more  limited  examination  of  results  from  other  tasks  will  also  be 
presented. 

Figures  28  and  29  present  the  Edge  Violation  and  RMS  Error  measures,  respectively,  for  the 
STRES  Unstable  Tracking  task  across  the  various  extended  trial  lengths.  The  3-minute  trial  length 
data  (Baseline  Day  2,  Trial  1)  are  presented  in  the  dark  column  in  the  foreground  followed  by  each 
increasing  trial  length  condition  divided  into  3-minute  epochs.  It  should  be  noted  that  6-minute 
trial  length  data  (transparent  in  the  figures)  appear  somewhat  unusual  and  may  reflect  the  fact  that 
this  condition  was  not  counterbalanced  in  its  presentation.  Due  to  protocol  restrictions,  the  6- 
minute  trial  length  condition  always  followed  one  of  the  longer  trial  length  conditions.  Thus,  these 
data  may  be  revealing  a  fatigue  factor  that  is  not  evident  in  other  data.  Also,  because  the  tracking 
task  is  continuous  and  requires  constant  motor  performance,  this  fatigue  effect  may  be  more  likely 
to  emerge  in  these  data  than  perhaps  in  any  other  task. 

Collapsing  the  data  across  all  3-minute  epochs  within  each  trial  length  condition  provided  the 
opportunity  to  examine  whether  there  was  a  difference  in  average  performance  across  the  four  trial 
lengths.  This  one-way  ANOVA  resulted  in  a  significant  trial  length  effect  for  RMS  Error  (7*3,90  = 
8.99,  p  <  0.0001),  but  only  a  marginally  significant  effect  for  Edge  Violations  (Fiyo  =  3.15, 
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Figure  29.  Extended  Trial  Length  Data  -  Unstable  Tracking  RMS  Error. 
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p  *  0.07).3  Subsequent  multiple  comparisons  (paired  t-tests  corrected  for  experiment-wise  type  I 
error  rate)  revealed  that  the  Baseline  3-minute  trial  differed  from  the  6-minute  trial  length  condition 
(t  30  =  3.85,  p  =  0.0006)  and  the  24-minute  trial  length  condition  (/  30  =  3.35,  p  =  0.002).  In 
general,  these  results  based  on  data  collapsed  across  epochs  within  the  trial  lengths  suggest  that  the 
average  Unstable  Tracking  performance  level  across  the  various  trial  lengths  is  not  the  same. 

Another  one-way  ANOVA  was  performed  on  the  first  3-minute  epoch  from  each  trial  length 
condition.  This  analysis  tested  whether  the  inital  performance  of  the  subject  was  different  given 
the  various  trial  length  conditions.  This  analysis  yielded  a  significant  trial  length  main  effect  for 
both  RMS  Error  (F3,9o  =  9.79,  p  <  0.0001)  and  Edge  Violations  (^3,90  =  7.87,  p  =  0.005). 
Subsequent  multiple  comparisons  (paired  t-tests  protected  for  experiment-wise  error  rate)  revealed 
that  the  only  significant  differences  involved  comparisons  with  the  6-minute  trial  length  data.  Due 
to  the  somewhat  questionable  nature  of  the  6-minute  data,  it  was  concluded  that,  while  statistically 
significant,  these  were  probably  not  theoretically  important  findings.  Aside  from  the  obviously 
anomalous  data  from  the  6-minute  trial  length  condition,  there  appeared  to  be  no  significant 
differences  between  the  first  3-minute  epochs  in  any  trial  length.  Thus,  subjects  appeared  to  begin 
each  trial  length  with  fairly  comparable  Unstable  Tracking  performance,  at  least  in  the  first  three 
minutes. 

Given  that  the  average  Unstable  Tracking  RMS  Error  performance  of  subjects  was  different 
across  the  various  trial  lengths,  another  logical  question  was  to  explore  where  these  trials  differed. 
The  Baseline  3-minute  trial  was  compared  to  each  set  of  epochs  from  the  other  trial  length 
conditions.  From  the  analysis  of  initial  3-minute  epochs,  it  was  clear  that  RMS  Error  performance 
during  the  Baseline  3-minute  trial  differed  significantly  from  the  initial  3-minute  epoch  of  the  6- 
minute  trial  length  condition.  Not  surprisingly,  a  subsequent  analysis  confirmed  that  RMS  Error 
performance  from  the  Baseline  3-minute  trial  was  also  significantly  better  than  the  second  3-minute 
trial  of  the  6-minute  trial  length  condition  (t  30  =  3.92,  p  =  0.0005).  An  analysis  comparing  the 
Baseline  3-minute  trial  to  each  additional  epoch  in  the  12-minute  trial  length  condition  failed  to 
yield  any  significant  differences  for  RMS  Error.  Comparisons  between  the  Baseline  3-minute  trial 
and  other  epochs  in  the  24-minute  trial  length  condition  yielded  a  number  of  significant  differences 
for  RMS  Error  performance.  In  fact,  simple  paired  t-test  comparisons  of  the  Baseline  trial  with  all 
3-minute  epochs  other  than  the  first  were  significant  at  or  near  thep  <,  0.01  level  of  significance. 
However,  when  alpha  was  adjusted  to  protect  the  family-wise  Type  I  error  rate,  RMS  Error 
performance  during  the  Baseline  3-minute  trial  was  significantly  better  than  only  the  third 


3  All  values  of  alpha  cited  for  ANOVA  analyses  employ  Greenhouse-Geisser  corrections. 
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(/  30  ■  3*89,  p  *  0.0005),  fourth  (t  30  =  3.46,  p  ■  0.0016),  sixth  ( 1 30  =  3.33,  p  =  0.002),  and 
seventh  (/30*  3.57,  p  =  0.0012)  epochs  in  the  24-minute  trial  length  condition. 

Hie  above  results  are  evidence  that  performance  did  vary  across  the  24-minute  period. 
Subjects  appeared  to  have  periods  in  which  they  performed  as  well  as  they  did  during  the  Baseline 
3-minute  period.  Likewise,  they  also  had  periods  where  their  performance  was  significantly  worse 
than  the  Baseline  3-minute  trial.  During  the  24-minute  trial  condition,  subjects  did  not  appear  to 
have  any  periods  during  which  their  performance  was  better  than  the  3-minute  trial  condition. 

Given  some  of  the  limitations  in  the  collection  of  the  extended  trial  data  that  were  mentioned  at 
the  beginning  of  this  section,  a  very  cautious  examination  of  the  performance  of  other  STRES 
battery  tasks  over  the  various  trial  lengths  was  conducted.  These  analyses  were  attempted  to  reveal 
general  trends  of  interest  only.  Extensive  and  detailed  analyses  were  viewed  as  inappropriate  given 
the  aforementioned  limitations.  In  this  regard,  comparisons  between  the  Baseline  3-minute  trial 
and  data  collapsed  across  epochs  within  the  other  trial  length  conditions  revealed  that  only  the 
percentage  correct  measures  for  the  STRES  Mathematical  Processing  and  the  STRES  Spatial 
Processing  tasks  yielded  significant  results.  For  Mathematical  Processing,  the  percentage  correct 
went  from  approximately  98%  for  the  Baseline  three-minute  trial  to  approximately  96%  for  the  24- 
minute  trial  while  Spatial  Processing  declined  from  96%  to  92%.  While  percentage  correct 
remained  constant  across  epochs  within  a  given  trial  length,  there  was  a  small,  yet  statistically 
significant  decline  in  the  average  percentage  correct  across  trial  length  conditions.  This  suggests 
that  as  trial  length  increases,  the  average  percentage  correct  decreases. 

Of  more  interest  were  the  results  of  a  general  analysis  of  the  24-minute  trial  length  data.  These 
data  were  viewed  as  among  the  most  viable.  One-way  ANOVA's  were  performed  across  epochs 
for  each  of  the  major  dependent  variables  for  the  STRES  battery  tasks.  The  results  of  these 
analyses  revealed  significant  differences  for  the  response  time  measures  of  the  STRES 
Grammatical  Reasoning  and  Mathematical  Processing  tasks.  Figures  30  and  31  present  the  data 
for  these  tasks,  respectively.  It  can  be  seen  that  there  is  a  decrease  (improvement)  in  response  time 
across  the  epochs.  This  trend  can  be  seen  for  other  trial  lengths  as  well.  This  apparent 
improvement  was  not  expected.  Also,  it  should  be  noted  that  the  6-minute  trial  length  condition 
did  not  show  the  rather  unusual  level  of  difference  that  appeared  in  the  tracking  data. 

Explanations  for  these  results  are  not  entirely  clear.  It  is  possible  that  the  continuous  nature  of 
the  tracking  task  and  its  constant  demand  on  psychomotor  activity  leads  to  a  level  of  neuromuscular 
fatigue  that  is  not  present  in  the  other  tasks.  In  fact,  the  intermittent  and  discrete  nature  of  the  other 
tasks,  coupled  with  possible  factors  such  as  increased  attention  over  time  to  skills  necessary  for 
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Figure  30.  Extended  Trial  Length  Data  *  Grammatical  Reasoning  Response  Time. 


105 


EXTENDED  TRIAL  LENGTH 
STRES  Math  Processing  Task 
(Response  Time) 


Response 

Time 


Trial 

Length 


Figure  31.  Extended  Trial  Length  Data  •  Math  Processing  Response  Time. 
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optimal  task  performance,  may  have  allowed  the  subjects  to  actually  become  more  proficient 
(respond  faster)  on  these  tasks,  compared  with  the  tracking  task.  In  other  words,  continued 
learning  may  have  taken  place  with  intensified,  massed  practice.  The  aberrant  nature  of  the  6- 
minute  trial  length  tracking  data  (i.e.,  the  condition  most  often  performed  after  prolonged  tracking 
trials)  gives  some  support  to  this  hypothesis.  It  should  be  noted,  however,  that  the  unexpected 
decrease  in  response  time  for  these  tasks  may  be  derived  not  only  from  the  opportunity  to  optimize 
performance  over  time,  but  also  from  a  shift  in  task  strategy.  The  declines  in  percentage  correct 
measures  noted  above  may  be  a  subtle  indication  that  the  subjects  are  trading  a  certain  degree  of 
accuracy  for  gains  in  speed.  This  possible  regulation  of  the  speed/accuracy  tradeoff  may  be  an 
important  index  of  prolonged  performance  efficiency. 

5.9  Usefulness  of  Psychometric  State  Measures 

Both  the  Stanford  Sleepiness  Scale  and  the  Mood  Scale  II  were  administered  at  various  times 
throughout  the  testing.  These  scales  were  included  to  obtain  information  regarding  their  ease  of 
administration  and  usefulness  as  research  tools  in  assessing  the  impact  of  testing  variables.  No 
extensive  attempt  was  made  to  perform  a  detailed  analysis  of  these  data  at  the  present  time. 
However,  the  subjects'  responses  to  these  questionnaires  were  compiled  and  plotted  for  visual 
analysis.  It  should  be  remembered  that  during  Training,  Baseline,  and  Retest  sessions,  these 
questionnaires  were  administered  prior  to  and  after  the  subjects  completed  the  STRES  battery 
portion  of  their  daily  testing  protocol.  Half  of  the  subjects  in  any  given  testing  session  performed 
the  STRES  battery  during  the  first  hour  of  the  two- hour  session  while  the  other  half  of  the  subjects 
were  performing  the  CTS-WRAIR  PAB  tasks.  These  groups  then  switched  workstations.  Thus, 
half  the  subjects  completed  the  psychometric  questionnaires  at  the  beginning  of  the  testing  session 
and  then  again  at  the  midpoint  in  the  testing  session.  The  other  half  of  the  subjects  completed  the 
questionnaires  first  at  the  midpoint  of  the  testing  session  and  then  after  the  testing  session  ended. 

Figure  32  presents  the  responses  of  subjects  on  the  Stanford  Sleepiness  scale.  The  subjects 
performing  the  STRES  Battery  first  in  the  testing  session  are  plotted  in  the  foreground.  The  other 
half  of  the  subjects,  who  performed  the  STRES  battery  during  the  second  half  of  the  testing 
session,  are  plotted  in  the  background.  From  this  figure,  it  appears  that  subjects  were  generally 
low  in  sleepiness  throughout  the  testing  period.  However,  they  appeared  to  increase  in  sleepiness 
slightly  during  the  first  one  hour  of  testing.  The  subjects  who  performed  the  STRES  battery 
second  show  a  similar  level  of  sleepiness  at  the  midpoint  in  the  testing  session  as  compared  to  the 
group  performing  the  STRES  battery  first.  They  then  reported  an  additional  increase  in  sleepiness 
during  the  final  one  hour  of  testing.  By  examining  both  the  between-  and  within-group  data,  it 
appears  that  sleepiness  gradually  increased  across  the  two-hour  testing  session.  It  should  be 
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remembered  that  these  trends  were  not  confirmed  with  statistical  tests,  yet  the  trends  are  clearly 
confirmed  by  anecdotal  reports.  Subjects  often  reported  feeling  more  tired  and  fatigued  at  the  end 
of  the  sessions. 
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Figure  32.  Stanford  Sleepiness  Scale. 


The  trends  seen  in  the  Stanford  Sleepiness  Scale  were  also  confirmed,  to  some  degree,  by 
similar  trends  in  relevant  scales  of  the  Mood  Scale  n.  Figures  33  through  39  present  the  various 
subscales  of  the  Mood  Scale  II,  as  well  as  the  mean  response  time  to  complete  the  scale.  It  appears 
that  the  Activity  and  Happiness  subscales  demonstrated  trends  somewhat  like  the  Stanford 
Sleepiness  scale.  That  is,  these  two  scales  revealed  modest  declines  in  activity  level  and  happiness 
through  the  test  session,  especially  from  the  first  administration  to  the  second  administration  within 
the  testing  session.  However,  these  data  did  not  show  the  continual  decline  found  in  the  sleepiness 
scale. 
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Figure  33.  Mood  Scale  II  -  Activity  Scale. 
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Figure  34.  Mood  Scale  II  •  Happiness  Scale. 
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Figure  35.  Mood  Scale  II  -  Fatigue  Scale. 
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Figure  36.  Mood  Scale  II  -  Anger  Scale. 
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Figure  37.  Mood  Scale  II  -  Depression  Scale. 
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Figure  38.  Mood  Scale  II  -  Fear  Scale. 
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Figure  39.  Mood  Scale  II  -  Response  Latency. 


Scales  such  as  Fatigue  and  Anger  (frustration)  would  be  predicted  to  show  trends  in  the 
opposite  direction.  In  fact,  a  visual  inspection  of  these  scales  reveals  that  they  did  indicate  apparent 
increasing  levels  of  self-reported  fatigue  and  anger.  Both  trends  appeared  to  be  fairly  continuous 
across  the  testing  session. 


The  two  more  clinically-related  scales  that  would  be  predicted  to  have  very  little  relationship  to 
psychological  state  during  testing  revealed  little  to  no  change  across  the  testing  session.  Finally,  it 
appears  that  the  subjects  completed  the  questionnaires  more  slowly  during  the  first  administration 
of  the  scale  in  comparison  to  the  second  administration  during  each  testing  session. 


These  data  appear  encouraging  with  respect  to  the  potential  usefulness  of  psychometric 
measures  in  assessing  psychological  state  during  task  performance.  However,  several  points 
should  be  kept  in  mind.  First,  this  was  a  very  preliminary  analysis.  Trends  in  the  subjects' 
responses  were  clearly  evident  and  the  trends  were  supported  by  similarities  in  dimensions  that 
were  rationally  related.  However,  these  trends  were  not  tested  statistically.  Second,  the  range  of 
scores  in  most  cases  was  very  low,  man>  times  less  than  one  scale  point  (out  of  seven  for  Stanford 
Sleepiness  and  out  of  three  for  Mood  Scale  U).  The  trends  and  consistencies  that  were  seen  were 
all  the  more  impressive  for  this  reason,  yet  adequate  measurement  ability  will  probably  require 
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greater  use  of  the  full  range  of  the  scale  score  responses.  Finally,  if  a  factor  was  assessed  by  these 
data  it  was  probably  the  cumulative  effect  of  testing  over  the  course  of  two  hours.  This  effect  may 
have  been  quite  small,  which  still  raises  some  encouragement  for  the  use  of  such  scales.  More 
powerful  effects,  such  as  drugs  or  environmental  variables  may  be  good  candidates  for  further 
study  of  the  usefulness  of  these  psychometric  scales. 
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6.0  SUMMARY 


The  following  comments  summarize  the  results  of  this  project.  While  there  were  numerous 
detailed  analyses  conducted  during  this  study,  and  while  there  are  still  many  more  that  can  be 
performed,  there  are  a  number  of  general  statements  that  characterize  the  major  findings  of  this 
project. 

•  A  substantial  database  has  been  established  for  selected  tasks  from  the  STRES  battery,  CTS 
and  WRAIR  PAB  based  on  a  well-defined  population  and  a  sizable  number  of  performance 
trials.  Percentile  breakpoints  at  20%  increments  were  included  to  allow  categorization  of 
subject  performance  as  Very  Good,  Good,  Average,  Poor  or  Very  Poor. 

•  With  few  exceptions,  the  data  obtained  in  this  project  showed  remarkable  consistency  across 
task  batteries  and  within  task  types  both  in  terms  of  actual  dependent  measure  values  and 
general  response  characteristics. 

•  The  reliability  of  the  tasks  varied  by  task  type  and  dependent  measure.  In  general,  response 
time  measures  of  various  tasks  yielded  acceptable,  and  in  some  cases  very  good,  reliability 
indices.  Percentage  correct  measures  of  tasks  were  almost  uniformly  unacceptable,  due  most 
probably  to  ceiling  effects.  Timing  tasks  (WRAIR  PAB  Time  Wall  and  Interval  Production) 
varied  in  their  reliability,  but  neither  yielded  an  impressive  level  of  reliability. 

•  Comparisons  of  similar  tasks  across  batteries  yielded  a  variety  of  findings.  Significant 
differences  were  sometimes  found  between  versions  of  the  same  task,  however,  these 
differences  were  often  not  greater  in  magnitude  than  the  difference  associated  with  the  day-to- 
day  changes  experienced  over  training  sessions.  The  Mathematical  Processing  and  Memory 
Search  tasks  yielded  no  differences  of  any  importance.  The  percentage  correct  measure  for  the 
Grammatical  Reasoning  task  was  higher  on  the  CTS  version  compared  with  the  STRES 
version,  but  no  difference  was  found  for  response  time.  The  CTS  Spatial  Processing  task  had 
faster  response  times  than  the  STRES  veision,  but  demonstrated  no  difference  in  percentage 
correct.  The  only  task  to  show  substantial  differences  for  all  dependent  measures  was  the 
Unstable  Tracking  task.  The  STRES  version  of  this  task  provided  a  smaller  RMS  Error  and 
fewer  Edge  Violations. 

•  Comparison  of  the  CTS  data  to  a  previous  CTS  database  revealed  good  correspondence  with 
the  exception  of  the  Unstable  Tracking  task,  which  had  been  substantially  changed  between 
CTS  Version  1.0  and  CTS  Version  2.0,  which  was  used  in  this  study. 
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•  No  important  differences  due  to  the  influence  of  task  order  were  observed  in  this  study. 

•  No  important  differences  due  to  the  influence  of  battery  sequence  were  observed  in  this  study. 

•  Response  deadlines  provided  a  faster  mean  response  time  but  at  the  expense  of  more  missed 
responses  when  actual  deadlines  were  imposed.  However,  when  subject  response  urgency 
was  increased  through  instructions,  the  faster  mean  response  times  were  not  accompanied  by  a 
significantly  lower  percentage  correct. 

•  Results  from  the  Extended  Trial  Length  analysis  revealed  that  during  the  first  three-minute 
epoch  of  performance,  subjects  appeared  to  perform  at  about  the  same  level  regardless  of  the 
overall  trial  length.  However,  average  performance  across  individual  trial  lengths,  differed 
from  one  trial  length  to  another.  There  was  some  evidence  that  subjects  performed  the 
continuous  Unstable  Tracking  task  more  poorly  and  more  erratically  over  an  extended  period 
of  time.  They  also  appeared  to  show  improvement  in  response  time  for  some  discrete  tasks 
over  the  extended  trial  length,  but  this  may  have  been  the  result  of  a  subtle  speed-accuracy 
tradeoff. 

•  Preliminary  analyses  suggest  that  the  psychometric  state  measures  included  in  this  study  have 
potential  for  effectively  assessing  changes  in  psychological  state. 
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APPENDIX  A 

SUBJECT  INSTRUCTIONS  FOR  DEADLINE  TESTING 
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Subject  Instructions  for  Deadline  Study  -  First  Day  (all  subjects) 

UntH  today  you  have  been  asked  to  respond  as  quickly  and  accurately 
as  possible  but  there  have  been  very  liberal  time  deadlines  placed 
on  your  performance  (typically  15  seconds). 

Starting  today  you  will  train  and  be  tested  under  conditions  which 
limit  the  amount  of  time  you  have  to  respond.  Initially,  this  limit 
will  be  indicated  by  the  stimulus  disappearing  from  the  screen. 

There  are  two  levels  of  deadline  time  limits:  MODERATE  and  VERY  SHORT. 

They  are  different  for  each  task.  You  will  be  Informed  of  the  level  at 
the  start  of  each  session. 


Attempt  to  maintain  your  current  level  of  accuracy,  but 
PLEASE  TRY  TO  RESPOND  BEFORE  THE  STIMULUS  DISAPPEARS!! 


Subject  Instructions  for  Deadline  Study  -  Second  Day  (all  subjects) 

Today,  your  first  session  will  be  under  the  same  deadlines  as  yesterday. 

The  remaining  sessions  will  be  under  three  different  deadline  levels: 

NO  deadline,  a  MODERATE  deadline,  and  a  VERY  SHORT  deadline,  but  not 
necessarily  in  that  order.  You  will  be  informed  of  the  deadline  level 
at  the  start  of  each  session. 


Attempt  to  maintain  your  current  level  of  accuracy,  but 
PLEASE  TRY  TO  RESPOND  BEFORE  THE  STIMULUS  DISAPPEARS!! 


Subject  Instructions  for  Deadline  Study  •  Third  Day  (all  subjects) 

Today,  the  three  sessions  will  be  under  different  deadline  levels  as 
they  were  yesterday.  When  everyone  has  completed  all  three  ssssions, 
we  will  meet  as  a  group  to  discuss  the  experiment. 
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Subject  Instructions  for  No  Deadline 


The  following  session  will  be  conducted  under  no  response  deadline 
as  during  all  the  training  sessions.  While  attempting  to  maintain 
a  high  level  of  accuracy,  please  respond  as  quickly  as  possible. 


Subject  Instructions  for  Moderate  Deadline  -  Actual  Deadline 

The  following  session  will  be  conducted  under  a  MODERATE  response 
deadline.  While  attempting  to  maintain  a  high  level  of  accuracy, 

PLEASE  RESPOND  BEFORE  THE  STIMULUS  DISAPPEARS  FROM  THE  SCREEN! 


Subject  Instructions  for  Moderate  Deadline  •  Pseudo  Deadline 

The  following  session  will  be  conducted  under  a  MODERATE  response 
deadline.  The  stimulus  may  not  disappear  from  the  screen  but  the 
deadline  cutoff  is  still  in  place.  While  attempting  to  maintain 
a  high  level  of  accuracy,  PLEASE  RESPOND  WITHIN  THE  DEADLINEI 


Subject  Instructions  for  Short  Deadline  -  Actual  Deadline 

The  following  session  will  be  conducted  under  a  VERY  SHORT  response 
deadline.  While  attempting  to  maintain  a  high  level  of  accuracy, 

PLEASE  RESPOND  BEFORE  THE  STIMULUS  DISAPPEARS  FROM  THE  SCREEN! 


Subject  Instructions  for  Short  Deadline  -  Pseudo  Deadline 

The  following  session  will  be  conducted  under  a  VERY  SHORT  response 
deadline.  The  stimulus  may  not  disappear  from  the  screen  but  the 
deadline  cutoff  is  still  in  place.  While  attempting  to  maintain 
a  high  level  of  accuracy,  PLEASE  RESPOND  WITHIN  THE  DEADLINEI 
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APPENDIX  B 

SCHEDULES  FOR  EXTENDED  TRIAL  LENGTH  TESTING 
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Schedule  for  Extended  Trial  Study  -  Group  A1 

HDAY  |  TUESDAY  I  WEDNESDAY  I  THURSDAY  I  FRU 
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Schedule  for  Extended  Trial  Study  -  Group  A2 
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APPENDIX  C 

STRES  NORMATIVE  DATA 
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UNIVARIATE  SUMMARY  FOR  STRES 
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STRES  Grammatical  Reasoning 
Mean  Response  Time 
(msec) 
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UNIVARIATE  SUMMARY  FOR  SIRES  MATHEMATICAL 
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STRGS  Mathematical  Processing 
Mean  Response  Time 
(msec) 
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UNIVARIATE  SUMMARY  FOR  STRES  MEMORY  SEARCH  2 
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STRES  Sternberg-2 
Mean  Response  Time 
(msec) 
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UNIVARIATE  SUMMARY  FOR  STRES  MEMORY  SEARCH  4 
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STRES  Sternberg-4 
Mean  Response  Time 
(msec) 
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UNJVARUVTE  SUMMARY  FOR  STRES  SPATIAL 


®rnf®ttr<niniti(ooo 
SOrNOOrfflflOaONlOO 
JPCT(*Jt-t-t-t-Oi-t-OOt-Oi- 
T“  T-  T-  T-  T”  V-  T-r-K-T-T-r-r-T- 
N 


.s^asasasas^^asasas^a^^a* 

fiaoffloioiwffloirooiwoaw 

in 

r» 


•  «(VNU)N«N«OMU)<fiCO0 

-(DMoeionncjrtaiuioN 

yoinotoNeooDODeoaoooeoooN 

to 

CM 


fl)  sp  -vO  vD  kA  vO  wO  -J)  sP  vp  vO  vO  <J>  vO  vO 
M  Q1'  0V  ^  0s*  qN  JN  0\  0V  0%.  0N  0N 

jntaaaaroiaiooiooffiNN 

£ooooaoooooo>a>aoeocoooaocno> 


lONBi-NNOINOIOIOIflnB 

(OMO#ONV(B'C<D^OOn« 

^^■00)0)0)0)0)0>0)0)00)0) 


Ssp  'jfl  sP  Sp  sP  sp  sp  sP  sP  sP  sP  sP  sP  sP 
osov8va'PN(r'Osr'ff'Qs{r'ovS'os 
wOIt-iPNOIMM  M  N  M  N  U) 

xoADiotaaoHDoaoinaai 


WinCilt-NOIMNNOONi-S 

0)<J)(»T-aiNOO(D(BOIDN«W 

CMCMCMOCMCMCMCMCMCOCMCMCMCM 


\P  \p  \P  vO  -SwO  sj>  <vfl  Vp  V-P  \0  Vp  Vp  sj> 

?ds-a'<5sos'os‘iT(rr't^O’Osfls-o^tfv 

®«<DBWI3lOU}®IO®IOtOMM 


BOnfCOOimnrrMfCMflIRl 

^UXPniOtOOMOMDCMntnNM 

"‘■M-'M-COC'JCMCMCMCMCMCMCMCtlT-CM 


T3a?a^a?a®sSa^a8a?a?a?a?a?a?^ 

gjlflNRNMaonoOSatONOa) 

_a)o>o>o>o>cncna>o>o>o><na><n 


WMMlONMONflONflOr-eoeB) 

^NSsoonsiwoiconoT-o 

"OflONNNNBMPIONNNN 


xja^a^a^a^a^aPa^a^a^aSa^a^a^a? 

ajO^NNlOOimiONNSNi-O 

oooorooa»coooooooeooo®(Dm 


(00)SOT(-CJr-Tf01r-r-in[s« 

IDtOtOltOlr-BONOimi-D-IO 

T-T-000>00)0  05«J)0)0  0}0» 


Sp  Sp  sp  Vp  sP  Np  Sp  Sp  Sp  Vp  ^p  ^  sP  vP 

0^  Iw*  O'*  y'' 

*OONO>WON«n^N»IO 

000>0>0>000>0>0>0>0>0>0>0>0> 


flodjoflutootooioxiiox) 

r-T-NKrtO^'ClOlflfT-tMCM 


«X>«X)(OX3rtJj03QCOX>oajD 

*-t-NNnnMM-iom*-i-NN 


138 


STRES  Spatial  Processing 
Mean  Response  Time 
(msec) 
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UNIVARIATE  SUMMARY  FOR  SITES  UNSTABLE  TRACKING 
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UNIVARIATE  SUMMARY  FOR  STRES  REACT  TASK 
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STRES  Reaction  Time  -  CODED  (2) 
Mean  Response  Time 
(msec) 


STRES  Reaction  Time  -  CODED  (2) 
Percentage  Correct 
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UNIVARIATE  SUMMARY  FOR  STRES  REACT  TASK 
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UNIVARIATE  SUMMARY  FOR  SIRES  UNSTABLE 
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APPENDIX  D 
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UNIVARIATE  SUMMARY  FOR  CTS  MATHEMATICAL 
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UNIVARIATE  SUMMARY  FOR  CTS  UNSTABLE  TRACKING 
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APPENDIX  E 

WRPAB  NORMATIVE  DATA 
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UNIVARIATE  SUMMARY  FOR  WRA1R  INTER  /AL 
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APPENDIX  F 

ONE-WEEK  AND  THREE-WEEK  RETEST  DATA 
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Percent  Correct 


100%  - 


90%  - 


o- 
Cr ' 


° . 


■  o 


80%  -r 


70%  - 


O  STRES 
o  CTS  '88 
-A  CTS  '91 


60%  - 


50% 


1 


2 


3  4  5 

Trial 


6 


7 


H 
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350  y 

300  - 

250  - 

200  y 

150  - 

100  - 

50  - 

0  — 


45  - 

40  - 

3  5  — 

3  0  -- 

25  ~ 
20  y 
1  5  T 

10  T 

5  T 

0  4- 


Unstabie  Tracking 
Edge  Violations 


I 


O- .  _  _ 

"'-a - 

-  -  G  . 


o- . 


O  SIRES  ] 
a  CTS  '88  | 
A  CTS  '91  I 


A- 

- 0.;— — .  ■  r-  ■  ■  r.Ax-2 

1  2  3 


- r  ■'  ■'  ■-  — =r- 

4  5  6  7 

Trial 


Unstable  Tracking 
RMS  Error 


o  SIRES 
a  CTS  '88 
•a  CTS  '91 

I 


- I - ) - - - - - 1 - 1 - 1 - 

1  2  3  4  5  6  7 

Trial 
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APPENDIX  I 


UNIVERSITY  OF  OKLAHOMA  DATA 
VS. 

ARMSTRONG  LABORATORY  DATA 
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STRES  Grammatical  Reasoning 
Mean  Response  Time 
(msec) 


6000 

5500 

5000 

4500 

4000 

3500 


+ 

t 


j 

I 

i 


3000  4 - 1 - 1 - 1 - 1 - 1 - 1 - 1 - 1 - 1 - 1 - 4 - 1 - 1 - 1 


la  lb  2a  2b  3a  3b  4a  4b  5a  5b  la  lb  2a  2b 

< . Training . >  <---Baseli,ie---> 


STRES  Grammatical  Reasoning 
Percent  Correct 


80%  t 

70%  - 


60%  j 

50%  — t - 1 - 1 - 1 - 4 - 1 - 1 - i - t - t - 1 - 1 - 1 - 1 

la  lb  2a  2b  3a  3b  4a  4b  5a  5b  la  1b  2a  2b 

< . Training--- . >  <---EJaseline---> 
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O  ARMSTRONG  LAB 
■A  OU  PHASE  1 
O  OU  PHASE  2 


j  o  ARMSTRONG  LAB  j 
j  *  OU  PHASE  1 
O  OU  PHASE  2 


STRES  Mathematical  Processing 
Mean  Response  Time 
(msec) 


2500 


2000 


1500 


1000 


500 


0 


100%  - 


90% 


80% 


70% 


P0% 


50% 


o  ARMSTRONG  LA  B 
-A  OU  PHASE  1 
O  OU  PHASE  2 


- - - 1 - h - (• - 1 - ; - t - 1 - 1 - 1 - 1 - i - » - 

la  lb  ?a  2b  3a  3b  4a  4b  5a  5b  la  lb  2a  2b 

< . - . Training . >  <---Baseline---> 


S'RES  Mathematical  Processing 
Percent  Correct 


'  O  ARMSTRONG  LAB 
|  -6  OU  PHASE  1 

i  o  OU  PHASE  2 

L _ 


- - - t - r - 1 - r - I - 1 - f - t - \ - 1 - 1 - 1 - 1 

la  lb  2a  2b  3a  3b  4a  4b  5a  5b  la  lb  2a  2b 

< . Training . >  <---Baseline---> 
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STRES  Stornberg-2 
Mean  Response  Time 
(msec) 


560  - 

540  - 

520  - 

500  - 

480  - 

460  t 

440  - 

4?0  - 

4  00  - - - - - 1 - 1 - 1 - i - 1 - 1 - j - 1 - 1 - 1 - 1 - 1 

la  1b  2a  2b  3a  3b  4a  4b  5a  5b  la  lb  2a  2b 

< . Training . >  <---Baseline---> 


A 

\ 

o  N 

■ .  \ 


‘A 


STRES  Sternberg-2 
Percent  Correct 


90%  -r 

80%  -r- 

( 

70%  t 

60%  - 


50%  - t - 1 - 1 - 1 - h - 1 - 1 - ! - 1 - i - 1 - 1 — I - 1 

la  1b  2a  2b  3a  3b  4a  4b  5a  5b  la  lb  2a  2b 

< . Training . >  <  —  Baseline  —  > 


O  ARMSTRONG  LAB 
A  OU  PHASE  1 
«  OU  PHASE  2 


O  ARMSTRONG  LAB 
-A  OU  PHASE  1 
O  OU  PHASE  2 
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650  ~ 


600  - 


550 


500  - 


450 


400 


100% 


90% 


60% 


70% 


60% 


50% 


STRES  Sternberg-4 
Mean  Response  Time 
(msec) 


4 - 1 - h 


la  ib  2a  2b  3a  3b  4a  4b  5a  5b  la  lb  2a  2b 


< 


Training 


>  <---Baseline  —  > 


STRES  Sternberg-4 
Percent  Coirect 


-O-  ARMSTRONG  LAB 
-L  OU  PHASE  1 
o  OU  PHASE  2 


-i - 1 - 1 - 1 - 1 - h - i - r 


la  lb  2a  2b  3a  3b  4a  <sb  5a  5b 

< . T  raining . . > 


la  lb  2a  2b 
<---Baseiine---> 
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3TRES  Spatial  Processing 
Mean  Response  Time 
(msec) 


1300  T 

i 


700  j 
600  t 


500  -f 


STRES  Unstable  Tracking 
Edge  Violations 

8  - 

7  - 


A 

4  -  ' 

O' 

\\ 

3  -  \. 

v, 

\ 


< . Training . >  <---Baseline---> 


STRES  Unstable  Tracking 
RMS  Error 


14  - 


1  2 


1  0 


8 


5 


4 


O. 


A 


2  - 


0 - : - 1 - * - t - ( - * - - 


ia  ib  2a  2b  3a  3b  4a  4b  5a  5b  la  lb  2a  2b 

< . Training . >  <---Baseline---> 


O  ARMSTRONG  LAB 
tA  OU  PHARE  1 
O  OU  PHASE  2  j 


O  ARMSTRONG  LAB 
-A  OU  PHASE  1 
O  OU  PHASE  2 
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STRES  Reaction  Time  ■  BASIC  (1) 
Mean  Response  Time 
(msec) 


400  - - 

T 1 


- 1 - 1 - 1 - ( - 1 - 

T2  T3  T  4  T5  B1  B2 

. Training . >  <---Baseline--*> 


STRES  Reaction  Time  -  BASIC  (1) 
Percent  Correct 


100%  - 


90%  t 


80%  - 


70%  - 


/ 

/ 

/ 


/ 

/ 

I  o  ARMSTRONG  LAB  ! 
j  -A  OU  PHASE  1 
|  O  OU  PHASE  2  i 


60%  - 


50% 


T 1 
<-- 


- 1 - t - t - ( - 1 - — 

T2  T3  T  4  T5  B1  B2 

. Training . . - . >  <---Baseline---> 
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STRES  Reaction  Time  -  CODED  (2) 
Percent  Correct 


100%  - 


80%  - 


70%  - 


n 


O  ARMSTRONG  LAB 
-6  OU  PHASE  1 
•O  OU  PHASE  2 


60%  - 


50% - 1 - t - ( - I - I - t - 1 

T 1  T  2  T3  T  4  T5  B1  B2 

< . Training . >  <---Baseline---> 
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STRES  Reaction  Tima  -  UNCERT  (3) 
Mean  Response  Time 

(msec) 


800  - 


500  - 


000  - - 1 - 1 - t - 1 - ‘r - h - 

Tt  T2  T3  T  4  T5  B1  B2 

< . —  Training . >  <---Baseline---> 


STRES  Reaction  Time  •  UNCERT  (3) 
Percent  Correct 


10P%  - 


90% 


4- 


80%  T 


70%  r 


•O  ARMSTRONG  LAB 
-A  OU  PHASE  1 
o  OU  PHASE  2 


60%  -f 


50%  - — i - — i - 1 - i - y 


T1  T2  T3  T4  T5  B1  B2 

< . . Training  — . >  <---Base!ine---> 
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STRES  Reaction  Time  -  DOUBLE  (4) 
Mean  Response  Time 
(msec) 


800  -r 


500  - 


400 


T1 

<- 


- 4 - 1 - 1 - 1 - 1 - 

T2  T3  T4  T5  81  B2 

. -Training . >  <---Baseline---> 


STRES  Reaction  Time  -  DOUBLE  (4) 
Percent  Correct 


80%  - 


70%  - 


j  O  ARMSTRONG  LAB 
|  OU  PHASE  1 

i 

o  OU  PHASE  2 


60%  - 


50% 


+ 


T 1  T2  T3  T  4  T5  B1  B2 

< . . Training . >  <---Baseline---> 
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STRES  Reaction  Time  -  INVERT  (t» 
Mean  Response  Time 
(msec) 


800  - 


700  • 


600  - 


•  '■  ARMSTRONG  LAB 
■A  OU  PHASE  1 
o  OU  PHASE  2 


500  -r- 


400  - 1 - 1 - t - 1 - f - 1 - 

T 1  T2  T3  T  4  T5  B1  B2 

< . Training . >  <---Baseline---> 


STRES  Reaction  Time  -  INVERT  (5) 
Percent  Correct 


O  ARMSTRONG  LAB 
-A  OU  PHASE  1 
O  OU  PHASE  2 


50%  -! - 1 - 1 - 1 - f - t - 1 - 

T 1  T2  T3  T  4  T5  B1  B2 

< . Training . > 
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<---Baseline--> 


STRES  Reaction  Time  -  3ASIC  (6) 
Mean  Response  Time 
(msec) 


eoo  - 


700 


600  - 


o  ARMSTRONG  LAB 
-A  OU  PHASE  1 
•o  OU  PHASE  2 


500  - 


400 


T 1 
<• 


4- 


T  2  T  3  T  4  T5 

. . Training . . > 


- 1 - 4 

B1  B2 

<-  -Baseline---> 


STRES  Reaction  Time  -  BASIC  (6) 
Percent  Correct 


100%  - 


90% 


80% 


70%  + 


O  ARMSTRONG  LAB  [ 
-A  OU  PHASE  1 

I 

•o  OU  PHASE  2  I 


60%  r 

50%  - - 1 - i - 1 - * - - - 1 - I - 

T 1  T2  T3  T  4  T5  B1  B2 

< . Training . . . > 
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<---Baseline---> 


CTS  Grammatical  Reasoning 
Mean  Response  Time 
(msec) 


6500  - 

6000  - 

5500  - 

t 

5000  i 
4500  r 


4000  I 
3500  t 

3000  - - 1 - 1 - 1 - 1 - 1 - 1 - 

T 1  T2  T3  T4  T5  B1  B2 

< . Training . >  <---Baseline--- 


100%  - 


90%  - 


80%  4- 


70%  - 


80%  - 


50% 


CTS  Grammatical  Reasoning 
Percent  Correct 


- - - , - 1 - -4 - f - 4 - 

T 1  T  2  T3  T  4  T5  B1  B2 

< . - . Training . ----> 


o  Armstrong  lab 
*  OU  PHASE  1 
<>  OU  PHASE  2 


O  ARMSTRONG  LAB 
•A  OU  PHASE  1 
OU  PHASE  2 


232 


<---Baseline---> 


2500  - 


2000  4- 


1500  f 


1000  - 


500  - 


0 


100%  - 


50% 


80%  - 


70%  - 


60%  - 


50%  — 


CTS  Mathematical  Processing 
Mean  Response  Time 
(msec) 


o  ARMSTRONG  LAB 
-A  OU  PHASE  1 
■o  OU  PHASE  2 


A - 1 - 1 - 1 - - - 4 


T 1  T2  T3  T4  T5  B1  B2 

< . Training . >  <---Baseline---> 


CTS  Mathematical  Processing 
Percent  Correct 


I  o  ARMSTRONG  LAB 

!  OU  PHASE  1 

| 

|  O  OU  PHASE  2 


- - - i - 1 - ; - 1 - 4 - 

T1  T2  T3  T  4  T5  B1  B2 

< . Training . >  <---Baseline---> 
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CTS  Memory  Search-4 
Mean  Response  Time 
(msec) 


700  - 


o.  . 


T 1  T2  T3  T  4  T5  B1  B2 

< . Training . >  <---Baseline---> 


CTS  Memory  Search-4 
Percent  Correct 


60%  i 


50%  4 - 1 — — - i - 1 — - — - 1 - - — I- - 1 - 1 

T 1  T2  T  3  T  4  T  5  B1  B2 


< . Training . >  <--Basoline---> 
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CIS  Spatial  Processing 
Mean  Response  Time 
(msec) 

1  100  - 


1  000  - 

200  - 

800  - 

700  - 


O  ARMSTRONG  LAB 
%  OU  PHASE  1 
O  OU  PHASE  2 


600  - 

500  - 


400  - — — - — — - ; . . . — i 

T 1  T  2  T  3  T  4  T5  B1  32 


< . Training . >  <---Baseline---> 


CTS  Spatial 
Percent 


Processing 

Correct 


100%  - 


I  o  ARMSTRONG  LAB 

j 

j  -A  OU  PHASE  1 
I  O  OU  PHASE  2 


60%  - 

'  50%  - -4 - - - r - - - - - 1 - t - 1 - - 

T 1  T  2  T  3  T  4  T5  B1  B2 

< . Training . -  >  <---Baseline---> 


80%  - 


70% 
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CTS  Unstable  Tracking 
Edge  Violations 


O  ARMSTRONG  LAB 
OU  PHASE  1 
«  OU  PHASE  2 


CTS  Unstable  Tracking 
RMS  Error 


5 


0  4 - 1 - J - - 1 - 1 - 1 - 1 - 1 

T 1  T2  T3  T4  T5  B1  B2 

< . . Training  —  - . >  <---Baseiine---> 

I 

I 
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WRAIR  Manikin 
Mean  Response  Time 
(msec) 


2000  - 
1800  - 
1600  - 
1400  - 
1200  - 
1000  - 
eoo  - 
600  - 
400  - 
200  - 

0  - ! - 1 - i— - i - 1 - r 


o. 


o  ARMSTRONG  UB 
OU  PHASE  1 
o  OU  PHASE  2 


T 1  T2  T3  T  4  T5  B1  B2 

< . Training . >  <---Baseline---> 


WRAIR  Manikin 
Percent  Correct 


100%  - 


90%  - 


80%  - 


70%  - 


O  ARMSTRONG  LAB 
-6  OU  PHASE  1 
o  OU  PHASE  2 


60% 


50% 


T 1  T2  T3  T  4 

< . Training . 


T5  B1  B2 

. >  <---Baselina---> 
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1  oooc 

9900 

91300 

9700 

9600 

9500 

9400 

9300 


800  - 

700  - 

COO  -r- 

500  - 

400  J- 

300  t 

200  - 

100  - 

0  t — 


WRAIR  Time  Wall 
Mean 
(msec) 


T 1  T2  T  3  T  4  35  B1  B2 

< . . . Training . >  <---Baseline---> 


WRAIR  Time  Wall 
Standard  Deviation 
(msoc) 


T 1  T  2  T'->  T  4  To  B1  B2 

< . Training . - . >  <---Baseline---> 


O  ARMSTRONG  LAB 
•A  OU  PHASE  1 
O  OU PHASE  2 


|  L>  ARMSTRONG  LAB 
-A  OU  PHASE  1 
<>  OU  PHASE  2 
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WRAIR  Interval  Production 
Mean 
(msec) 


1100  - 

1000  - 

900  - 

800  - 

700  - 

600  - 

500  - 

400  — 


140  - 

1  20  - 

1  00  - 

80  - 

60  - 

40  - 

2  0  - 

0  — 


.  -  O 


o 


j  o  ARMSTRONG  LAB  | 
j  -A  OU  PHASE  1  J 
|  O  OU  PHASE  2 


T 1  T2  T  3  T  4  T5  B1  B2 

<-■- . Training . >  <---Baseline---> 


WRAIR  Interval  Production 
Standard  Deviation 
(msec) 


I  o  ARMSTRONG  LAB 

j 

I  A  OU  PHASE  1 
j  ^  OU  PHASE  2 


T 1  T  2  T  3  T  4  T5  B1  B2 

< . . Training-- . >  <---Baseline--  > 
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APPENDIX  J 

BATTERY  SEQUENCE  DATA 
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6500  - 

6000  - 

5500  - 

5000  - 

4500  - 

4000  - 

3500  - 

3000  — 


100%  - 

90%  - 

80%  - 

70%  - 

60%  - 

50%  - 


STRES  Grammatical  Reasoning 
Mean  Response  Time 
(msec) 


o. 


o. 


A"  --  .  --Wx 


'O 


'X - X, 


vy ; 

-x. 


'A- 

‘X - *X 


s”:  . 

'  A 


■  ■  c  ws 
A  sew 
<-  s  w  e 
*  w-c-s 


H 


’a  lb  2a  2b  3a  3b  4a  4b  5a  5b  la  1b  2a  2b 


Training 


>  <  —  Baseline . > 


Battery  Sequence  Effect 


STRES  Grammatical  Reasoning 
Percent  Correct 


■  •  c-w-s 
A  s  c-w 
J  o  s-w-c 

I  X  W-C-3 


la  lb  2a  2b  3a  3b  4a  4b  5a  5b 

< . -Training . > 

Battery  Sequence  Effect 


la  lb  2a  2b 
<  —  Baseline . > 
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STRfcS  Mathematical  Processing 
Mean  Response  Time 
(msec) 


2500 


o 


2000 


1  500 


1000 


K-' 


X 


.  o 

.  •  o  . 

.  o. 

o 

'  o 

O' 

o 

v-r 

X- 

•  ■  —t 

?'_V- - -  - 

.... 

-A 

. A-  - 

-A- 

A  -  -  ft  ■-  " 

A  -  ■■ 


O 

-X' 

... 

A, 


O 


■  •  C-W  s 
a  s-c-w 
o  s-w-c 
■X  w-c-s 


500  - 


la  ib  2a  2b  3a  3b  4a  4b  5a  5b  la  lb  2a  2b 

< .  . Training . >  < - Baseline---  > 

Battery  Sequence  Effect 


STRES  Mathematical  Processing 
Percent  Correct 


100% 


90% 


-X-  ■ 


80% 


/0%  - 


i  ’■  C  W-S 
A  sew 
O  S-W-C  I 

X  w-c-s  I 


60%  - 


50% 


la  lb  2a  2b  3a  3b  4a  4b  5a  5b 
-  -  - . Training . . - . > 


- 1 - 4 - •. - 

la  lb  2a  2b 
< - Baseline - > 


Battery  Sequence  Effect 
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STRF.S  Stemberg-2 
Mean  Response  Time 

(msec) 


560  - 
540  - 
520  - 
500  t 
480  - 

460  - 

440  - 

420  - 

400  ~ 


100% 

90% 

80% 

70% 

60% 

50% 


O  C-W-S 

*  s-c-w 

o  S-W-C 

•x  w-c-s 


- - , - - - - - 1 - 1 - I - 4 - 1 - 1 - 1 - i - 1 - 4 - 1 

la  ib  2a  2b  3a  3b  4a  4b  5a  5b  la  lb  2a  2b 

< . Training . >  <  —  Baseline > 


Battery  Sequence  Effect 


STRES  Sternberg-2 
Percent  Correct 


I  o  c-w-s  ! 

|  -a  s-c-w  i 

I  i 

!  o  S-W-C  ! 

i 

i  -x  w-c-s 


- - - - - - - - - - - 1 - - - 1 - - - - - 

ia  ib  2a  2b  3a  3b  4a  4b  5a  5b  la  lb  2a  2b 

< . Training . >  <  —  Baseline  —  > 


Battery  Sequence  Effect 
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650  - 

600  - 

550  - 

500  - 

450  - 

400  ~ 


100% 

90% 

80% 

70% 

60% 

50% 


STRES  Sternberg-4 
Mean  Response  Time 
(msec) 


.0 


r  o  c-w-s 

a  s-c-w 
<  s-w-c 
'  X  w-c-s 


la  lb  2a  2b  3a  3b  4a  4b  5a  5b  la  ib  2a  2b 

- . Training . >  < - Baseline - > 

Battery  Sequence  Effect 


STRES  Sternberg-4 
Percent  Correct 


I  o  c-w-s 

i  I 

[  -A  s-c-w  j 

I  o  S-W-C 

I 

j  X  W-C-S 


ia  lb  2a  2b  3a  3d  4a  4b  5a  5b  la  lb  '.a  2b 

< . Training . . >  < - Baseline - > 

Battery  Sequence  Effect 
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STRES  Spatial  Processing 
Mean  Response  Time 
(msec) 


1300  - 

1200  - 

1100  — 

1000  ~ 

900  - 

.  o . 
x 


■  o 

■  x- 
-  -Cs' 


eoo  • 

700  - 
600  - 
500  - 
4  00 


la  ib  2a  2b 
<  —  Baseline  —  > 

Battery  Sequence  Effect 


ia  ib  2a  2b  3a  3b  4a  4b  5a  5b 
<-•- . T  raining . > 


•  ■  c-w-S 
a  s-C-w 
o  s-w-C 
x  w-C-S 


STRES  Spatial  Processing 
Percent  Correct 


1  oo% 


90%  - 


v/r  " 


80% 


70%  - 


C-W-S 
A  S-C-w 
<-  S-W-C 
■X  W-C-S 


60% 


50%  - - - - - - - - - - - - - 

la  ib  2a  2b  3a  3b  4a  4b  5a  5b 

< . . Training . .  > 

Battery  Sequence  Effect 


la  ib  2a  2b 
< - Baseline - > 
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STRES  Unstable  Tracking 
Edge  Violations 


o 


Training . .  < - Baseli  ne- - ■•> 


-  -  C-W-S 
•a  s-C-w 
!  o  s-w-c 
■X  w-c-s 


Battery  Sequence  Effect 


STRES  Unstable  Tracking 
RMS  Error 


■<  >  c  w  s 
A  S-C-w 
o  s-w-c 

X  W-C-S 


a  lb  2a  2b  3a  3b  4a  4b  5a  5b  la  1b  2a  2b 


Training 


>  <  —  Baseline - > 


Battery  Sequence  Effect 
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STRES  Reaction  Time  ■  BASIC  (1) 
Mean  Response  Time 
(msec) 


800  r 


700  - 


600  f 


500  - 


O. 


o  c-w-s 

-O  S-C-W 
■O  S-W-C 

*  w-c-s 


400 


T 1 


< 


- i - h- - ( - 

T2  T3  T  4  T5 

. Training . > 


B1  B2 

<  —  Baseline - > 


Battery  Sequence  Effect 


STRES  Reaction  Time  -  BASIC  (1) 
Percent  Correct 


100%  - 


&  c-w-s 
•O  s-c-w 
o  s-w  c 
X  W-C-S 

60%  - 

T 1  T  2  T  3  T  4  T  5  Bi  B2 

< .  . Training . - . >  < - Baseline - > 

Battery  Sequence  Effect 
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STRES  Reaction  Time  -  CODED  (2) 
Mean  Response  Time 
(msec) 


800  t 


7C0  • 


600  ~ 


500  - 


c-w-s 
-a  s-c-w 
■o  s-w-c 
■x  w-c-s 


400 


T  1 


- , - I - 1 - 

T  2  T  3  T4  T  5 


< 


Training 


> 


- 1 - 

B1  B2 

< - Baseline  —  > 


Battery  Sequence  Effect 


STRES  Reaction  Time  -  CODED  (2) 
Percent  Correct 


100%  - 


90%  - 


80%  - 


70%  - 


o  c-w-s 
s-c-w 
■o  s-w-c 
-x  w-C-S 


60%  - 


50% 


T 1 


< 


- < - 1 - * - + - ( - 

T2  T3  T  4  75  B1  B2 

. Training . . >  <  —  Baseline  —  > 


Battery  Sequence  Effect 
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STRES  Reaction  Time  -  UNCERT  (3) 
Mean  Response  Time 
(msec) 


800 


700 


600  T 


500  - 


■a  s-c-w 
O  3-W-C 
■X  W-C-S 


4  C  0 - 1 - 1 - 1 - t- - - — - - — I - - 

T 1  T  2  13  T  4  T  5  B1  B2 

< . Training . >  < - Baseline  —  > 

Battery  Sequence  Effect 


STRES  Reaction  Time  -  UNCERT  (3) 
Percent  Correct 


A  S-c-w 
o  S  w-c  1 


60%  - 


50% - — 

T 1 

< . 


T  2  T  3  T  4  T  5  B1  B  2 

. Training- . >  <  —  Baseline  —  > 


Battery  Sequence  Effect 
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SIRES  Reaction  Time  -  DOUBLE  (4) 
Mean  Response  Time 
(msec) 


eoo  - 


700  - 


600  - 


500  - 


O  c-w-s 
■A  s-c-w 
o  s-w-c 

i 

■X  w-c-s  I 

I 


400  — 

T  1 

<----■ 


T  2  T  3  T4  T  5  B1  B2 

. Training . >  <  —  Baseline  —  > 


Battery  Sequence  Effect 


STRES  Reaction  Time  -  DOUBLE  (4) 
Percent  Correct 


80%  - 


70%  - 


O  C-W  S 
-A  s-c-w 
o  s-w-c 
X  w-c-s 


60%  -r- 


50%  -- 


T  1 


< 


T  2  T  3  T  4  T  5  B1  B2 

. Training . . >  < - Baseline---  - > 


Battery  Sequence  Effect 
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STRES  Reaction  Time  -  INVERT  (5) 
Mean  Response  Time 
(msec) 


800 


700  t 


600 


500 


<•>  c-w-s 
*  s-c-w 
o  s-w-c 
x  W-C-S 


400  — 


T  1  T  2  T  3 

< . Training- 


T  4 


T5 


B 1  B2 

<  —  Baseline  —  > 


Battery  Sequence  Effect 


STRES  Reaction  Time  -  INVERT  (5) 
Percent  Correct 


100%  - 


90%  | 


80%  - 


70%  - 


60%  - 


O  C-W-S 
■o  s-c-w 
o  s-w-c 
!  X  W-C-S 

I 


50% - - - - - - - 

T 1  T2  T3  T  4 

< . Training . 


T  5 


> 


B 1 


<  —  Baseline  —  > 


Battery  Sequence  Effect 
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STRES  Reaction  Time  •  BASIC  (6) 
Mean  Response  Time 
(msec) 


eoo  - 


700 


600  - 


500  T 


o  c-vv  s 
*  sew 
o  s-w-c 
X  W-C-S 


400 


-+- 


-4- 


T 1 


T2 


T3 

-Training- 


T  4 


T5 


81  B2 

<  —  Baseline  —  > 


Battery  Sequence  Effect 


STRES  Reaction  Time  *  BASIC  (6) 
Percent  Correct 


100%  - 


90%  - 


80%  -r 


70%  - 


o  c-w-s 
■A  s-c-w 
O  s-w-c 
x  W-C-S 


607o  -- 


50% 


T 1 


< 


- - - 1 - 1 - 1 - 

T2  T3  T  4  T  5  B1  82 

. Training . . . >  <  —  Baseline - > 


Battery  Sequence  Effect 
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6500  - 

6000  - 

5500  - 

5000  - 

4500  - 

4000  - 

3500  ~ 

3000  — 


100%  - 

90%  - 

80%  - 

70%  - 

60%  - 

50%  — 


CTS  Grammatical  Reasoning 
Mean  Response  Time 
(msec) 


T 1  72  T  3  T  4  T5 

. T  raining . > 


- 1 - 

B1  B2 

<  —  Baseline - > 


Battery  Sequence  Effect 


CTS  Grammatical  Reasoning 
Percent  Correct 


o  s-w-c 
•x  w-C-S 


+ 


T 1  T  2  T3  T  4  T5 

• . T  raining . > 


- 1 - 

B1  B2 

< —  Baseline  —  > 


Battery  Sequence  Effect 


254 


CTS  Mathematical  Processing 
Mean  Response  Time 

(msec) 


2500 


2000 


1500 


1000  - 


500  - 


!  o 
I 

;  -a 
!  o 


T 1  T  2  T  3  T  4  T5  Bl  B2 

< . Training . >  < - Baseline - > 

Battery  Sequence  Effect 


1  oo% 


90% 


00% 


70% 


60% 


50% 


CTS  Mathematical  Processing 
Percent  Correct 


T 1  T  2  T3  T  4  T5 


Bl 


B2 


< 


Training 


>  <  —  Baseline  —  > 


Battery  Sequence  Effect 


■O- 

A 

O 

•X 


c-w-s  ; 

s-c-w  I 

I 

s-w-c  ; 
wc-s  1 


c-w  s 
s-c-w 
s-w-c 
w-c-s 
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CTS  Memory  Search-4 
Mean  Response  Time 
(msec) 


700  - 


600  - 


500  - 


o  c-w-s 
-a  sc-w 
;  O  S  W-C 


x  w-c-s  ! 


400 


T 1 


< 


- 1  -  ■  '  ■■■  ♦ - -I -  t  •  "--r-  - 

T  2  T3  T  4  T  5  B1  B2 

. Training . >  <  —  Baseline  —  > 


Battery  Sequence  Effect 


CTS  Memory  Search-4 
Percent  Correct 


100%  - 


90%  * 


80%  - 


70%  - 


60%  - 


O  C-W-S 
-A  s  c-w 
o  s-w-c 
x  w-c-s 


50% - 1 - r - f - - - ! - — - - - 

T 1  12  T3  T  4  15  B1  B2 

< . Training . >  < - Baseline - > 

Battery  Sequence  Effect 
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CTS  Spatial  Procassing 
Mean  Response  Time 
(msec) 


1100  - 


1000  - 


600  - 

500  - 


'  -  c-vv  s 
-6  s  e  w 
o  sw-c 
X  w-c-s 


400 


T 1 


<■ 


- , - t - 1 - 

T2  T3  T4  T5 

■ . Training . > 


Battery  Sequence  Effect 


- 1 - 1 

B1  B2 

<  —  Baseline  —  > 


CTS  Spatial  Processing 
Percent  Correct 


100%  - 


x 

o 


O  c-w-s 
-6  s-c-w 
O  s-w-c 
x  W-C-S 

60%  - 

50% - * - - - ‘ - 1 - 1 - 1 

T 1  T2  T3  T  4  T5  B1  B2 

< . . Training . >  < - Baseline - > 

Battery  Sequence  Effect 


80%  - 


70%  ~ 
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t>  >r 


CTS  Unstable  Tracking 
Edge  Violations 


’  '  c-w  s 
A  s-c-w 
o  s-w-c 
X  w-c-s 


T2  T  3  T  4  T  5  B1  B2 

. Training . >  < - Baseline - > 


Battery  Sequence  Effect 


CTS  Unstable  Tracking 
RMS  Error 


r>  C-W  S 
■A  s  c-w 
O  s-w-c 

X  W-C-S 


v’o. 


T  2  T  3  T  4  T5  B1  32 

. Training . . >  <  —  Baseline  —  > 

Battery  Sequence  Effect 


T  2  T  3  T  4  T5  61  32 

. Training . . >  <  —  Baseline  —  > 
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WRAIR  Manikin 
Mean  Response  Time 
(msec) 


2000  - 
1  800  - 
1600  - 
1400  - 
1200  - 
1000  - 
800  - 
600  - 
400  “ 
200  - 
0  - 


- - - 1 - , - 1 - 1 - 1 - 

T 1  T  2  T3  T  4  T5  B1  B2 


< 


Training 


>  < - Baseline - > 


'  C-W-S 

J  *  s-c-w 

o  s-w-c  ! 

i  i 

j  -X  w-c-s  j 


Battery  Sequence  Effect 


WRAIR  Manikin 
Percent  Correct 


100%  T 


90%  - 


80%  - 


70%  ^ 


O  C-WS 
-A  S-C-W 

<■  s-w-c 
X  w-c-s 


60%  - 


50% 


T 1  T  2  T  3  T  4  T5 


B 1  B2 


< 


T  raining 


>  <  —  Baseline - > 


Battery  Sequence  Effect 
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WRAIR  Time  Wall 
Mean 
(msec) 


o  C 

^  s 
o  s 
x  W 


9000 


II 


< 


T2  T3  T  4  T5 

. Training . > 


Battery  Sequence  Effect 


- 1 - 

B1  B2 

< - Baseline - > 


WRAIR  Time  Wall 
Standard  Deviation 
(msec) 


eoo  - 

700  - 

600  - 
500  - 
400  " 
300  - 

200  - 

100  - 


o  c- 

I  -a  s- 

I 

j  O  s- 
j  -x  w 


T 1  T  2  T3 

< . Training- 


T  4 


T  5 


B1  B2 

<  —  Baseline  —  > 


Battery  Sequence  Effect 


-w-s 

-c-w 

-w-c 

'-C-S 


-w-s  j 

c-w  i 

i 

w-c 
-c-s  j 


2(0 


WRAIR  Interval  Production 
Mean 
(msec) 


1100  — 

1000  - 

900  - 

300  - 

700  - 

600  - 


c-w-s 
•A  s-c-w 
o  s-w-c 
x  w-c-s 


500  - 

400  - 

T  1 

< . 


- 1 - - - i - 1 - * - 

T  2  T3  T  4  T  5  81  B2 

• . Training . . >  <  —  Baseline  —  > 


Battery  Sequence  Effect 


WRAIR  Interval  Production 
Standard  Deviation 
(msec) 


140  - 


120  - 

100  - 

80  - 

60  - 

40  - 


A. 


<->  C-W-S 

-a  s-c-w 
o  s-w-c 
*  w-c-s 


20  - 

0 - ' - ♦ - 1 - - - - — ' - 

T 1  T  2  T  3  T  4  T  5  B1  B2 

< . Training . >  < - Baseline - > 

Battery  Sequence  Effect 
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APPENDIX  K 

TASK  ORDER  DATA 
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STRES  Grammatical  Reasoning 
Mean  Response  Time 
(msec) 


6000  - 

5500  - 

5000  - 

4600  - 

4000  - 


;  O  MP_MS_SP_UT_GR 
-a  M$_/jR_MP_irr_SP 
O  UT_SP_GR_MP_MS 
I  X  GR_MP_UT_SP_MS 


3500  - 


3000  - 1 - 1 - 1 - t - -l - 1 - t - ' - 1 - 1 - *■ 


la  lb  2a  2b  3a  3b  4a  4o  5a  5b 

< . Train  ing . > 


la  lb  2a  2b 
<---Baseline---> 


Task  Sequence  Effect 


STRES  Grammatical  Reasoning 
Percent  Corrto- 


100% 


90% 


80% 


70% 


o  mp_ms_sp_ut_gr  i 

-a  MS_GR_MP_UT_SP  ! 
O  UT_SP_Gfi_MP_MS  : 

X  gr_mp.ut_sp_ms 


60%  - 


50% - - - - - - - - - - - - - - - - - * - 

la  ib  2a  2o  3a  3b  4a.  4b  5a  5b  is  lb  2a  2b 

< . Training . >  <--Baselme--> 

Task  Sequence  Effect 
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STRES  Mathematical  Processing 
Mean  Response  Time 
(msec) 


2500  - 


2000  - 


1500  - 


1000 


4 1-  MP_MS_SP_UT._GR 
*  MS_Gft_MP_UT_.SP 
O  UT_SP_GR_MP_MS 
x  GR_MP_UT  SP  MS 


500 


1  a 


lb  2a  2b  3a  3b  oa 

. Training . 


4b 


5a  5b 

. > 


la  lb  2n  2b 

<---Baseline---> 


Task  Sequence  Effect 


STRES  Mathematical  Processing 
Parcent  Correct 


80%  - 


70%  - 


i  <>  MP_MS_SP_UT_GR  | 
i  I 

i  *  MS_GR_MP_UT_SP  ; 

O  "  GR_MP_MS 

>  ' ■._MP_.UT_Sr_MS 


60%  - 


50% 


la  ib  2a  2b  3a  3b  4a  4b  5a  5b  la  lb  2a  2b 

< . Training . >  <---Baselino--  > 


Task  Sequence  Effect 
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560 

540 

520 

500 

480 

460 

440 

420 

400 


100%  - 

90%  - 

80%  - 

70%  - 

60%  - 

50%  - 


STRES  Sternberg-2 
Mean  Response  Time 
(msec) 


* 

O  N 


•  '  MP_MS_SP_l T  GR 
■A  MS_.GR.  MP_Ur_SP 
O  UT_SP_GR_MP_MS 
X  GR_MP_UT._SP.-MS 


la  ib  2a  2b  3a  3b  4a  4b  5a  5b 

< . . . Training . . > 

Task  Sequence  Effect 


ia  ib  2a  2b 
<--Baseline---> 


STRES  Sternberg-2 
Percent  Correct 


or 


;.X— --X- 


-X - X - x-  ■ 

n  —  t\  —  ^ 


1  -  MP  MS.  SP  UT  GR 
A  MS_GR_MP_UT_SP 
o  U  r_SP_GR_MP_MS 
x  GR_MP_UT_SP_MS 


la  ib  2  a  2b  3a  3b 

: . Training- 


4a 


4b  5a 


5b  la  lb  2a  2b 

-->  <---3aseline---> 


Task  Sequence  Effoct 
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STRES  Sternberg-4 
Mean  Response  Time 
(msec) 


650  - 


500  - 


•  ■  MP_MS_SP_UT_GR 
■A  MS_GR_MP_UT_SP 
j  o  UT_SP_GR_MP_MS 
I  x  GR_MP_UT_SP_MS 


450  - 


400  - . . .  t  »  »  » —  7  — -4 -4 - 1  — r- - t  * —  ■■  +  - j  — 

la  ib  2a  2b  3a  3b  4a  4b  5a  5b  la  1b  2a  2b 

< . . Training . >  <---Baseline---> 

Task  Sequence  Effect 


STRE3  Sternberg-4 
Percent  Correct 


100% 


“X-— 


.  A  . 

-x* 


90%  -r 


?0%  - 


70%  - 


60%  - 


O  MP_MS_SP_UT_GR 
-A  MS_GR_Md_ut_SP 
O  UT_3P_GR_MP_MS 
x  GR_MP_UT_SP_MS 


50%  - - - 1 - - - - - 1 - 1- - 1 - i— — — - 1 - 1 - 1 - 1 - < 

la  lb  2a  2b  3a  3b  4a  4b  5a  5b  la  ib  2a  2b 

<■ . Training . . >  <---Base!ine---> 

Task  Sequence  Effect 
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STRES  Spatial  Processing 
Mean  Response  Time 
(msec) 


1300 
1200 
1100 
1000 
900 
800  - 
700  - 
600  - 
500  - 

400  - - • - 1 - 1 - 1 - 1 - i - t - i - r - i - . - H - r - I 

la  lb  2a  2b  3a  3b  4a  4b  5a  5b  la  lb  2a  2b 

< . Training . . >  <---Baseline---> 

Task  Sequence  Effect 


x 


STRES  Spatial  Processing 
Percent  Correct 


100%  - 


80%  - 

70%  - 

60%  - 

50%  " - + - 1—  ■  -  r- —  - — - 1 - 1 - - — “ — - r - 

la  ib  2a  2b  3a  3b  4a  4b  5a  5b  la  lb  2a  2b 

< . Training . >  <---8aseline---> 

Task  Sequence  Effect 


0  MP  MS_SP_UT_GR  i 
*  MSj3R_MP  UT  SP 
o  UT_SP_GR_MP_MS 
X  GR_MP_UT_SP_MS 


o  MP_MS_SP_UT_GR 
*  MS_GR_MP_lfT_SP 
■0  UT_SP_GR_MP_MS 
x  GR_MP_UT_SP_MS 
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SIRES  Unstable  Tracking 
Edge  Violations 


$  - 

7  - 

6  -- 

5  - 

4  - 


4 

\ 

\ 

l 

o  \ 

\ 


Task  Sequence  Effect 


STRES  Unstable  Tracking 
RMS  Error 


1  4  - 


O  MP_MS_SP_UT_GR 
*  MS_GR_MP_UT_SP 

o  ut_sp_gr_mp_ms 

X  GR  MP  UT  SP  MS 


4  - 


o-  MP  MS  3P_UT  GR 

~  i 

■a  ms_gr_mp_ut_sp  ! 

O  UT _SP_GR_MP_MS 
X  GR  MP  UT  SP  MS 


'a  lb  2a  2b  3a  3b  4a  4b  5a  5b  la  lb  2a  2b 

< . Training . >  <---Baselme---> 


Task  Sequence  Effect 
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CTS  Grammatical  Reasoning 
Mean  Response  Time 
(msec) 

6500 

6000 

5500 

5000 

4500 

4000 

3500  - 

3000  - - - - - — - - ' - - 

T1  T  2  T  3  T  4  T5  B1  B2 

< . Training . --->  <---Baseline---> 

Task  Sequence  Effect 


CTS  Grammatical  Reasoning 
Percent  Correct 


100% 


60%  - 

50%  - — - * - 1 - -t - * - 1 - t - 

T 1  T  2  13  T  4  T  5  ESI  E>2 

< . Training . . >  <---Baseline---> 

Task  Sequence  Effect 


MP_MS_SP_.UT_GR 

MS_GR.MP_UT_SP 

ut_sp_gr_mp_ms 

G  R_M  P_UT_S  P_M  S 


MP_MS_SP_UT_GR 
MS  GR_MP_UT  _SP 
UT_SP_GR...MP_MS 
GR_MP_UT_SP  MS 
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CTS  Mathematical  Processing 
Mean  Response  Time 
(msec) 


2500  - 


2000  - 


1500  - 


1000  - 


500  - 


0  — 


100%  - 


90%  - 


80%  - 


70% 


00% 


50% 


'  -  MP.MS_SP_UT_GR 
A  MS_GR_MP_UT_SP 
o  UT_SP_GR_MP_MS 
x  GR_MP_UT_SP_MS 


T 1  T  2  T3  T  4  T5 

. T  raining . > 


B1  B2 

<---Baseline---> 


Task  Sequence  Effect 


CTS  Mathematical  Processing 
Percent  Correct 


!  O  MP_MS_3P_IJT_GR  j 
-a  MS_GR.MP_ur_SP  j 
!  o  UT_SP_GR_MP_MS  j 
J  x  GR_MP_UT_SP_MS  j 


- - - — - i - i - ( - - - T - 1 - 

T 1  T  2  T3  T  4  T  5  B1  B2 

. Training . >  <---Baseline---> 

Task  Sequence  Effect 


271 


CTS  Memory  Search -4 
Mean  Response  Time 
(msec) 


400 - * - - - 1 - 

T 1  T2  T3  T  4 

< . Training . 


B1  B2 

<---Baseline---> 


Task  Sequence  Effect 


CTS  Memory  Search-4 
Percent  Correct 


100%  - 


90%  - 


80%  - 

<» 

70%  - 

x 

60%  -- 

50% - - - t - 4 — — - t - - - i - - 

T 1  T2  T  3  T  4  T5  B1  B2 

< . Training . >  <---Baseline---> 

Task  Sequence  Effect 


4  >  MP_MS_SP_UT_GR 
%  MS_GR_MP_UT_SP 
O  UT_SP_GR_MP_MS 
X  GR_MPJJT_SP_MS 


MP  MS.SP  UT  GR 
MS_GR_MP_LTT_SP 
UT_SP_GR_MP_MS 
GR.MP_UT_SP_MS 
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CTS  Spatial  Processing 
Mean  Response  Time 
(msec) 


1100  - 

1000  - 


j  <  1  MP  _MS„SP_UT_GR 
■A  MS.GR_MP_UT_SP 
o  UT_SP_GR_MP_MS 
X  GR_MPJJT_SP_MS 


— 1 


T 1 


< 


T2  T3  T  4  T5 

■ . T  raining . . > 


Task  Sequence  Effect 


B1  B2 

<---Baseline---> 


CTS  Spatial  Processing 
Percent  Correct 


100%  - 


90%  - 


80%  - 


70%  - 


O  MP_MS_SP_UT_GR  ! 
■a  MS_GR_MP_UT_SP  ! 
O  UT_SP_GR_MP_MS 
*  GR  MP  UT  SP  MS 


60%  - 


50'/, 


T 1  T2  T  3  T  4  T5 


B1  B2 


< 


Training 


>  <---Baseline---> 


Task  Sequence  Effect 
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CTS  Unstable  Tracking 
Edge  Violations 


5  - 


-+-■ 


-+- 


“f" 


T 1 


T  2  T3 

. Training- 


T  4 


T5  B1  B2 

--->  <---Baseline---> 


Task  Sequence  Effect 


CTS  Unstable  TracKing 
RMS  Error 


35  - 


30 


t 


o  MP_MS_SP_UT_GR 
^  MS_GR_MP_UT_SP 
O  UT_SP_GR_MP_MS 
X  G  R_M  P_  UT_S  P_M  S 


T 1 


T2  T3  T  4  T5  B1  B2 

. Training . >  <---Baseline---> 


Task  Sequence  Effect 


U.S.Q.P.O.:1W3-750-061/81018 
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